Description
A harder custom bytecode VM with a richer instruction set than Virtual Machine 0. You must either fully emulate execution in Python or invert the bytecode transformation to determine what input the VM accepts.
Download the VM binary and bytecode file.
Load the binary into Ghidra for static analysis.
wget https://artifacts.picoctf.net/c/509/vm1chmod +x vm1Solution
Walk me through it- Step 1Diff the dispatch loop against VM 0VM 1 reuses most of VM 0's opcodes and adds a handful. Confirming the overlap upfront saves you from re-reversing instructions you already understand.
Learn more
Walk the dispatch
switchin Ghidra and tag each case as "reused" or "new" relative to your VM 0 opcode table. A typical diff looks like:- Reused from VM 0:
LOAD_IMM,ADD,SUB,CMP,JEQ,READ,HALT. - New in VM 1: bitwise ops (
XOR,AND,OR,SHL,SHR), multiply/divide, indirect memory load/store (LDM,STM), and an unconditionalJMP. - Register file changes: the register array often grows from 4 to 8 or 16 slots, and width may widen from
uint8_ttouint32_t; check the array type in the decompiler.
For the underlying Ghidra workflow when you hit an unfamiliar pattern, see Ghidra Reverse Engineering.
- Reused from VM 0:
- Step 2Skeleton: dispatch dictionary and arg-parse driverLay down the emulator skeleton before reversing every opcode. A dispatch dict plus an argparse-driven main is enough to start tracing the bytecode the moment the first few opcodes are correct.python
python3 vm1_emulator.py --traceLearn more
Five lines that cover the dispatch dict; copy this verbatim and fill in handlers as you reverse them:
HANDLERS = { 0x01: lambda vm, a, b: vm.set_reg(a, b), # LOAD_IMM 0x02: lambda vm, a, b: vm.set_reg(a, vm.regs[a] + vm.regs[b]),# ADD 0x05: lambda vm, a, b: vm.set_reg(a, vm.regs[a] ^ vm.regs[b]),# XOR (new) 0x10: lambda vm, t: vm.jump_if_eq(t), # JEQ 0xFF: lambda vm: vm.halt(), # HALT }And the argparse setup so
--traceworks on day one without you writing it from scratch:import argparse, sys def main(): p = argparse.ArgumentParser() p.add_argument("bytecode", help="path to extracted bytecode blob") p.add_argument("--trace", action="store_true", help="print every instruction") p.add_argument("--input", default="", help="VM stdin (one byte per char)") args = p.parse_args() bc = open(args.bytecode, "rb").read() vm = VM(bc, stdin=args.input, trace=args.trace) vm.run() if args.trace: print("final regs:", vm.regs, file=sys.stderr) if __name__ == "__main__": main()When the emulator reaches an unknown opcode it should print the PC, the offending byte, and the current register state, then halt. That diagnostic is your work queue for the next reversing pass. For the broader Python-in-CTF toolkit, see Python for CTF.
- Step 3Trace execution to identify the accepted inputRun the emulator with --trace and read the comparison instructions. The constant loaded into the register that gets compared against your input is the value the VM expects.python
python3 vm1_emulator.py --trace --input 'A'bash# Or brute force when input is a single byte:bashfor i in $(seq 0 255); do printf '\\x%02x' $i | ./vm1 | grep -q picoCTF && echo $i; doneLearn more
The trace will show an instruction sequence like
LOAD_IMM r3, 0x42followed byREAD r4followed byCMP r3, r4followed byJEQ <success>. The immediate (0x42in this example) is the value the VM accepts. Feed it to the binary and the comparison succeeds.Should you reach for angr? Probably not yet.
angrcan solve this kind of constraint automatically, but its learning curve is steep, the symbolic-execution model has plenty of footguns (state explosion, missing function summaries, slow paths through libc), and getting it to load a custom VM correctly is itself a research project. Get the manual emulator working first; if it ever stops scaling (multi-byte input, large search space), then graduate toangrwith a working oracle as your sanity check. - Step 4Provide the accepted input to get the flagSupply the value identified by your emulator to the actual VM binary. It validates the input and prints the flag.bash
echo '<ACCEPTED_INPUT>' | ./vm1Learn more
With the correct input identified through emulation or symbolic execution, the VM's comparison succeeds, the jump takes the success branch, and the flag is printed. This confirms that your emulator faithfully reproduced the VM's behavior.
Writing a full VM emulator is a powerful reversing skill. It applies directly to malware analysis (custom packers and obfuscators frequently use VMs to hide payloads), game hacking (game scripting engines), and vulnerability research (firmware with custom instruction sets). The methodology is the same regardless of the target: opcode table, emulator, trace, extract.
Flag
picoCTF{...}
This challenge was not solved during the competition. Follow the steps above to reproduce the solution.