Python is the most reversible compiled language in CTF
Here is the honest take: Python reversing is easier than reversing C, not harder. The bytecode is documented. The standard library ships a disassembler. Even frozen executables are just zipped .pyc files with a stub in front. Every secret the program uses at runtime is sitting somewhere in that artifact, because Python has to be able to load it.
The confusion comes from not knowing which tool to reach for. Someone who has never seen a .pyc file googles for ten minutes, gets three conflicting answers about uncompyle6 and pycdc, and gives up. Someone who has done it once opens the file, checks the magic bytes, picks the right tool, and has source in 30 seconds. This guide is the 30-second version.
Every secret a Python program uses at runtime has to be present in the artifact. The question is just where.
Python challenges in picoCTF fall into three categories: source code with obfuscation layered on top, compiled bytecode (.pyc files), and frozen executables that bundle the entire interpreter (PyInstaller, py2exe, cx_Freeze). Each requires a different opening move, but the logic after that is nearly identical. Identify the type, extract the real logic, read the flag comparison.
Three-second triage: what are you actually looking at
Run file ./chall and strings ./chall | head -20 before doing anything else. Those two commands tell you which category you are in.
| What you see | Type | First move |
|---|---|---|
| file: ASCII text, .py extension | Python source | Read it; check for exec( or eval( calls |
magic bytes: xx 0d 0d 0a | Compiled bytecode (.pyc) | python3 -m dis chall.pyc or run a decompiler |
| strings: "PyInstaller" or "_MEIPASS" | Frozen executable | python pyinstxtractor.py ./chall |
| file: Zip archive, .pyz extension | Python zip archive | unzip chall.pyz -d out/ then decompile the .pyc inside |
| strings: "UPX!" alongside "PyInstaller" | UPX-packed frozen binary | upx -d ./chall first, then pyinstxtractor |
Spotting a .pyc file without an extension requires checking the first four bytes. Every .pyc starts with a two-byte magic number followed by 0d 0a. Run xxd chall | head -1 and compare the first two bytes against the table below to identify the Python version before you pick a decompiler.
# Quick check: is this a .pyc?xxd chall | head -1# Output: 00000000: 550d 0d0a ... <-- Python 3.8 bytecode# Python version from first two bytes (little-endian magic)# 3.8 -> 55 0d 3.9 -> 61 0d# 3.10 -> 6f 0d 3.11 -> a7 0d# 3.12 -> cb 0d 3.13 -> d3 0d# Or let Python tell you directly:python3 -c "import importlib._bootstrap_external as b; print(b.MAGIC_NUMBER.hex())"
file and xxd before assuming anything from the name. weirdSnake in picoCTF 2024 is a classic example: the file is named snake with no extension, but the magic bytes immediately identify it as Python bytecode.Reading bytecode with dis: the universal fallback
The Python standard library ships a disassembler. No installs, no version compatibility headaches. If a decompiler fails, python3 -m dis chall.pyc always works. It is also the right tool when you want to see the exact instructions rather than an approximate source reconstruction.
python3 -m dis chall.pyc | less# Or from inside a script:import disimport marshalwith open('chall.pyc', 'rb') as f:f.read(16) # skip 16-byte header (Python 3.8+; 8 bytes for 3.7 and older)code = marshal.loads(f.read())dis.dis(code)
Most CTF flag checkers boil down to four opcodes. Learn these and you can read a flag comparison without reconstructing full source:
| Opcode | Meaning | Why CTF cares |
|---|---|---|
| LOAD_CONST | Push a literal value onto the stack | Keys, ciphertext arrays, and expected values live here |
| BUILD_LIST / BUILD_STRING | Assemble N items from the stack into a list or string | A long sequence of LOAD_CONST followed by BUILD_LIST is the ciphertext array |
| COMPARE_OP | Compare top two stack values (==, !=, etc.) | The flag comparison. The expected value sits on the stack just before this |
| BINARY_XOR / BINARY_OP | Apply a binary operator to the top two stack values | Tells you exactly how the cipher works without reading any source |
The picoCTF 2024 challenge weirdSnake is the canonical bytecode XOR example. The file has no extension, but xxd snake | head -1 reveals a .pyc magic number immediately. Running python3 -m dis snake shows two key sequences:
# Key string built character by character:2 0 LOAD_CONST 0 ('t')2 LOAD_CONST 1 ('_')4 LOAD_CONST 2 ('J')6 LOAD_CONST 3 ('o')8 LOAD_CONST 4 ('3')10 BUILD_STRING 512 STORE_NAME 0 (key_str)# Ciphertext integers in a list:4 0 LOAD_CONST 0 (99)2 LOAD_CONST 1 (116)4 LOAD_CONST 2 (38)...XX BUILD_LIST NXX STORE_NAME 1 (input_list)
The LOAD_CONST sequence before BUILD_STRING reconstructs key_str = 't_Jo3'. The LOAD_CONST integers before BUILD_LIST are the ciphertext. A BINARY_XOR opcode later in the disassembly confirms the cipher. Once you have both, decryption is three lines of Python:
from itertools import cyclekey_list = [ord(c) for c in 't_Jo3']input_list = [99, 116, 38, ...] # all values from LOAD_CONST sequenceflag = ''.join(chr(a ^ b) for a, b in zip(input_list, cycle(key_list)))print(flag) # picoCTF{N0t_sO_coNfus1ng_sn@ke_...}
itertools.cycle when the key is shorter than the ciphertext. Plain zip(input_list, key_list) stops at the shorter sequence and silently truncates the flag. This is the most common decryption bug in Python CTF scripts.Decompiling .pyc files: which tool wins for which version
Decompilers reconstruct Python source from bytecode. The quality varies wildly by Python version. The short rule: use pycdc for Python 3.9 and later; use uncompyle6 for Python 2.x through 3.8. When both fail, fall back to dis and read the opcodes directly.
| Tool | Python version support | Install |
|---|---|---|
| pycdc | 3.0 through 3.13 (best for 3.9+) | Build from source (see below) |
| uncompyle6 | 2.x through 3.8 | pip install uncompyle6 |
| decompile3 | 3.7 and 3.8 (fork of uncompyle6) | pip install decompile3 |
| pycdas | Same as pycdc (ships alongside it) | Built with pycdc; raw disassembly output |
# uncompyle6 (Python 2.x to 3.8)pip install uncompyle6uncompyle6 chall.pyc# decompile3 (Python 3.7-3.8 alternative)pip install decompile3decompile3 chall.pyc# pycdc (Python 3.9+ and anything uncompyle6 fails on)git clone https://github.com/zrax/pycdc && cd pycdccmake . && make./pycdc chall.pyc# pycdas: raw disassembly (like dis but more annotated)./pycdas chall.pyc
When a decompiler prints Internal error or produces garbled output, check the Python version first. The magic bytes in the first four bytes of the .pyc file tell you exactly which version compiled it. Feeding a Python 3.11 .pyc into uncompyle6 will always fail; switch to pycdc.
dis module, including symbolic names for jump targets and cleaner constant formatting. When decompilation fails and raw dis output is hard to read, pycdas is a good middle ground.When the decompiler fails, read the opcodes. The disassembler never fails.
Unpacking PyInstaller frozen executables
PyInstaller bundles a Python interpreter, all imported modules, and your script into a single ELF or PE file. At runtime, it extracts a temporary directory (the _MEIPASS folder) and runs the bundled .pyc. From a reversing perspective, all the Python source is still there, just compressed inside the binary.
The fingerprint is reliable: strings ./chall | grep -i 'PyInstaller\|_MEIPASS\|pyi-' returns hits on any PyInstaller binary. Once confirmed, run pyinstxtractor to extract the contents:
# Detectstrings ./chall | grep -i 'PyInstaller\|_MEIPASS'# Get pyinstxtractor (single-file script, no install needed)wget https://raw.githubusercontent.com/extremecoders-re/pyinstxtractor/master/pyinstxtractor.py# Extractpython3 pyinstxtractor.py ./chall# Creates: chall_extracted/# Find the main script (same name as the binary, no .pyc extension usually)ls chall_extracted/file chall_extracted/chall# Decompilecp chall_extracted/chall chall_main.pycpycdc chall_main.pyc
Two gotchas that catch people the first time:
- Missing magic bytes. pyinstxtractor usually prepends the correct magic bytes automatically, but if decompilation fails with "bad magic number," manually prepend them. The correct magic for the Python version pyinstxtractor reports is at
chall_extracted/struct.pyc(first four bytes). Copy those four bytes, then prepend twelve zero bytes to match the full 16-byte header, and prepend the whole 16 bytes to your main .pyc. - Wrong entry point. The binary name in the extracted folder is the entry point, but large PyInstaller bundles also contain many library .pyc files. Look for the one whose name matches the binary or run
grep -rl 'picoCTF\|flag\|password' chall_extracted/to find which .pyc contains the relevant logic.
# If pycdc reports 'bad magic number', fix the header manually:MAGIC=$(xxd -l 4 chall_extracted/struct.pyc | awk '{print $2$3}' | sed 's/../\\x&/g')printf "${MAGIC}" > header.bin# Append 12 zero bytes (padding for Python 3.8+ header format)python3 -c "open('header.bin','ab').write(b'\x00'*12)"cat header.bin chall_extracted/chall > chall_main.pycpycdc chall_main.pyc
PYZ-00.pyz_extracted folder, all the bundled modules are there as individual .pyc files. The main application logic is almost always in the top-level file matching the binary name, not inside this folder.Peeling exec() obfuscation layers
The most common obfuscation pattern in Python CTF challenges is a self-decoding script: an outer script that decodes an inner payload and runs it via exec(). The outer script might base64-decode, XOR-decrypt, Fernet-decrypt, or zlib-decompress the payload before executing it. Multiple layers stack.
The fundamental insight never changes. At some point, every layer has the decoded payload in a variable, and it calls exec(payload) or exec(payload.decode()). Replacing that one call with print(payload.decode()) exposes the inner script without executing it.
# Pattern 1: simple base64 exec# Original: exec(base64.b64decode(b'aW1wb3J0...'))# Fix: print(base64.b64decode(b'aW1wb3J0...').decode())# Pattern 2: zlib + base64 chain# Original: exec(zlib.decompress(base64.b64decode(b'eJy...')).decode())# Fix: print(zlib.decompress(base64.b64decode(b'eJy...')).decode())# Pattern 3: marshal code object (cannot print as string, use dis instead)import marshal, base64, discode = marshal.loads(base64.b64decode(b'YwAAAAA...'))dis.dis(code) # disassemble the hidden code object# Pattern 4: multiple nested layers# Run the outer print to get layer 2, then repeat
The picoCTF 2022 challenge unpackme.flag.py uses Fernet encryption (AES-128-CBC with HMAC-SHA256) with a hardcoded key. The outer script imports the key as a string literal, constructs a Fernet object, decrypts the embedded ciphertext, and calls exec(plain.decode()). Replacing the exec with a print reveals the inner Python script, which itself prints the flag when run.
# Inspect any self-decoding script without running the payload# Edit unpackme.flag.py: change exec(plain.decode()) to print(plain.decode())sed -i 's/exec(plain.decode())/print(plain.decode())/' unpackme.flag.pypython3 unpackme.flag.py# Prints the inner source code; the flag is inside
eval() to parse unknown output from a CTF server or challenge binary. Use ast.literal_eval() instead. It parses only Python literals (strings, numbers, lists, dicts, tuples) and refuses to execute arbitrary code. The picoCTF 2025 challenge quantum-scrambler shows exactly why: the server returns a nested list literal that looks safe to eval but should never be trusted unconditionally.The marshal pattern is the trickiest variant. When the payload is a compiled code object rather than a string, you cannot print it as text. The code object is the same format Python uses internally for function bodies. Disassembling it with dis.dis(code) gives you the opcodes directly, which is everything you need to trace the flag logic.
The five CTF patterns: what flag checkers actually do
After working through dozens of Python reversing challenges, the same five structures keep appearing. Recognizing the pattern takes you from "how do I even start" to "oh I know exactly what to do" in seconds.
1. Hardcoded comparison loop
The simplest pattern. The script checks user input character by character against a hardcoded expected value. In source, it is a for loop with if char != expected[i]. In bytecode, it is COMPARE_OP inside a loop. The expected value is visible as a string literal or a list of integer ASCII codes in the LOAD_CONST sequence.
crackme-py (picoCTF 2021) is the cleanest example. The script encodes the flag with a simple Caesar-style rotation and compares the result to a hardcoded string. Read the rotation function, invert it, apply it to the expected string.
2. XOR with key from constants
The ciphertext is stored as a list of integers. The key is a string built from individual LOAD_CONST characters. The decryption is a repeating-key XOR loop. Already covered in the dis section above using weirdSnake as the example.
3. Custom cipher in obfuscated source
Some challenges ship a .py file where the logic is deliberately obfuscated: variable names replaced with single letters, whitespace stripped, string operations inlined. The cipher itself is simple (Caesar, Vigenere, XOR, character remapping), but reading the source takes effort. bloat.py (picoCTF 2022) is a classic. The script uses cryptic variable names but the actual logic is a character lookup in a static alphabet. Read the lookup table and invert it.
For this pattern, patching is often faster than reversing. Find the comparison and replace it with a print. The patchme.py (picoCTF 2022) challenge takes this literally: the password is in the source, and grepping for the assignment finds it immediately.
4. Exec layer chain
Covered in detail in the obfuscation section above. The fingerprint is anexec( call anywhere in the file. Peel each layer by replacing the exec with a print and running the script again. Two or three layers is the typical maximum in CTF challenges.
5. Bytecode virtual machine
The most advanced pattern. The challenge ships a custom bytecode format and a Python interpreter for it. Your task is to understand the bytecode ISA (instruction set architecture) and trace execution to find the flag. The Virtual Machine 0 and Virtual Machine 1 challenges from picoCTF 2023 both use this pattern: a Python script implements a tiny VM, and the bytecode program for that VM encodes the flag check. Read the VM loop, trace the bytecode, extract the expected value.
picoCTF challenge map
Python reversing challenges span nearly every picoCTF event. The difficulty ladder is gradual enough that beginners can start at the source-reading end and work up to bytecode VM challenges without gaps.
| Challenge | Pattern | Key skill |
|---|---|---|
| patchme.py (2022) | Hardcoded comparison | Grep source for password assignment |
| bloat.py (2022) | Obfuscated source cipher | Read alphabet lookup, replace exec with print |
| unpackme.flag.py (2022) | Exec layer (Fernet) | Swap exec for print to expose inner script |
| crackme-py (2021) | Rotation cipher in source | Read decode function, apply inverse |
| weirdSnake (2024) | Bytecode XOR | dis, extract LOAD_CONST sequence, XOR decrypt |
| quantum-scrambler (2025) | Permutation cipher in source | ast.literal_eval, reverse the shuffle |
| Virtual Machine 0 (2023) | Bytecode VM | Read VM loop, trace bytecode ISA |
| Virtual Machine 1 (2023) | Bytecode VM (harder) | Build a disassembler for the custom ISA |
The picoGym Exclusive Picker I, Picker II, and Picker III challenges are a good warm-up series for Python source reading, each adding one more layer of indirection before you can call the flag-printing function.
Tool quick reference
Decision tree
- Run
file ./challandstrings ./chall | head -30. - Plain .py source with
exec(inside? Swap exec for print, run, repeat. - Magic bytes end in
0d 0d 0a? It is a .pyc. Runpython3 -m disor pick a decompiler from the version table. - strings shows "PyInstaller" or "_MEIPASS"? Run pyinstxtractor, find the main .pyc, decompile.
- Decompiler fails? Read the opcodes with
disorpycdas. Every constant the program compares against is visible in LOAD_CONST.
Install cheat sheet
# uncompyle6 (Python 2.x - 3.8)pip install uncompyle6uncompyle6 chall.pyc# decompile3 (Python 3.7-3.8 alternative)pip install decompile3decompile3 chall.pyc# pycdc / pycdas (Python 3.9+ and anything uncompyle6 fails on)git clone https://github.com/zrax/pycdc && cd pycdccmake . && make./pycdc chall.pyc # decompile to source./pycdas chall.pyc # raw disassembly# pyinstxtractor (frozen executables)wget https://raw.githubusercontent.com/extremecoders-re/pyinstxtractor/master/pyinstxtractor.pypython3 pyinstxtractor.py ./chall# dis (built-in, always works)python3 -m dis chall.pyc | less
One last thing. The reason Python reversing feels hard to beginners is not the bytecode. It is the decision paralysis at the start: which tool, which version, which file. Once you have a triage habit, the actual reversing is just reading Python with slightly worse formatting. You already know the language. The disassembler output looks alien the first time, but LOAD_CONST and COMPARE_OP are just function calls and if statements in disguise. Read them that way.
For the binary side of reversing (C and C++ executables), the Ghidra Reverse Engineering guide covers the equivalent workflow. For symbolic execution that can solve Python flag checkers automatically when the logic is complex, see the angr CTF Tutorial.