Virtual Machine 1 picoCTF 2023 Solution

Published: April 26, 2023

Description

A harder custom bytecode VM with a richer instruction set than Virtual Machine 0. You must either fully emulate execution in Python or invert the bytecode transformation to determine what input the VM accepts.

Download the VM binary and bytecode file.

Load the binary into Ghidra for static analysis.

bash
wget https://artifacts.picoctf.net/c/509/vm1
bash
chmod +x vm1
  1. Step 1Diff the dispatch loop against VM 0
    VM 1 reuses most of VM 0's opcodes and adds a handful. Confirming the overlap upfront saves you from re-reversing instructions you already understand.
    Learn more

    Walk the dispatch switch in Ghidra and tag each case as "reused" or "new" relative to your VM 0 opcode table. A typical diff looks like:

    • Reused from VM 0: LOAD_IMM, ADD, SUB, CMP, JEQ, READ, HALT.
    • New in VM 1: bitwise ops (XOR, AND, OR, SHL, SHR), multiply/divide, indirect memory load/store (LDM, STM), and an unconditional JMP.
    • Register file changes: the register array often grows from 4 to 8 or 16 slots, and width may widen from uint8_t to uint32_t; check the array type in the decompiler.

    For the underlying Ghidra workflow when you hit an unfamiliar pattern, see Ghidra Reverse Engineering.

  2. Step 2Skeleton: dispatch dictionary and arg-parse driver
    Lay down the emulator skeleton before reversing every opcode. A dispatch dict plus an argparse-driven main is enough to start tracing the bytecode the moment the first few opcodes are correct.
    python
    python3 vm1_emulator.py --trace
    Learn more

    Five lines that cover the dispatch dict; copy this verbatim and fill in handlers as you reverse them:

    HANDLERS = {
        0x01: lambda vm, a, b: vm.set_reg(a, b),                      # LOAD_IMM
        0x02: lambda vm, a, b: vm.set_reg(a, vm.regs[a] + vm.regs[b]),# ADD
        0x05: lambda vm, a, b: vm.set_reg(a, vm.regs[a] ^ vm.regs[b]),# XOR (new)
        0x10: lambda vm, t:    vm.jump_if_eq(t),                      # JEQ
        0xFF: lambda vm:       vm.halt(),                             # HALT
    }

    And the argparse setup so --trace works on day one without you writing it from scratch:

    import argparse, sys
    
    def main():
        p = argparse.ArgumentParser()
        p.add_argument("bytecode", help="path to extracted bytecode blob")
        p.add_argument("--trace", action="store_true", help="print every instruction")
        p.add_argument("--input", default="", help="VM stdin (one byte per char)")
        args = p.parse_args()
    
        bc = open(args.bytecode, "rb").read()
        vm = VM(bc, stdin=args.input, trace=args.trace)
        vm.run()
        if args.trace:
            print("final regs:", vm.regs, file=sys.stderr)
    
    if __name__ == "__main__":
        main()

    When the emulator reaches an unknown opcode it should print the PC, the offending byte, and the current register state, then halt. That diagnostic is your work queue for the next reversing pass. For the broader Python-in-CTF toolkit, see Python for CTF.

  3. Step 3Trace execution to identify the accepted input
    Run the emulator with --trace and read the comparison instructions. The constant loaded into the register that gets compared against your input is the value the VM expects.
    python
    python3 vm1_emulator.py --trace --input 'A'
    bash
    # Or brute force when input is a single byte:
    bash
    for i in $(seq 0 255); do printf '\\x%02x' $i | ./vm1 | grep -q picoCTF && echo $i; done
    Learn more

    The trace will show an instruction sequence like LOAD_IMM r3, 0x42 followed by READ r4 followed by CMP r3, r4 followed by JEQ <success>. The immediate (0x42 in this example) is the value the VM accepts. Feed it to the binary and the comparison succeeds.

    Should you reach for angr? Probably not yet. angr can solve this kind of constraint automatically, but its learning curve is steep, the symbolic-execution model has plenty of footguns (state explosion, missing function summaries, slow paths through libc), and getting it to load a custom VM correctly is itself a research project. Get the manual emulator working first; if it ever stops scaling (multi-byte input, large search space), then graduate to angr with a working oracle as your sanity check.

  4. Step 4Provide the accepted input to get the flag
    Supply the value identified by your emulator to the actual VM binary. It validates the input and prints the flag.
    bash
    echo '<ACCEPTED_INPUT>' | ./vm1
    Learn more

    With the correct input identified through emulation or symbolic execution, the VM's comparison succeeds, the jump takes the success branch, and the flag is printed. This confirms that your emulator faithfully reproduced the VM's behavior.

    Writing a full VM emulator is a powerful reversing skill. It applies directly to malware analysis (custom packers and obfuscators frequently use VMs to hide payloads), game hacking (game scripting engines), and vulnerability research (firmware with custom instruction sets). The methodology is the same regardless of the target: opcode table, emulator, trace, extract.

Flag

picoCTF{...}

This challenge was not solved during the competition. Follow the steps above to reproduce the solution.

Want more picoCTF 2023 writeups?

Useful tools for Reverse Engineering

Related reading

What to try next