Description
Your goal is to find the flag. The server compresses your input together with the flag, then encrypts the result. Exploit the compression oracle to recover the flag character by character.
Setup
Connect to the service.
nc mercury.picoctf.net <PORT_FROM_INSTANCE>Solution
Walk me through it- Step 1Understand the CRIME/BREACH compression oracleThe server compresses your input + the secret flag together before encrypting. If your input shares a prefix with the flag, compression will produce shorter output. By guessing the flag one character at a time and observing output lengths, you can recover the flag.
Learn more
CRIME and BREACH attacks exploit the interaction between data compression and encryption. The attack works because:
- Compression algorithms like DEFLATE replace repeated substrings with back-references. If your input string matches a substring of the secret, the combined compressed output is shorter.
- When the server compresses (input + secret) together and you can observe the output length, you learn information about shared substrings between input and secret.
The signal you're listening for. A correct one-character guess shrinks the compressed output by roughly one byte; a wrong guess produces an output one byte longer. With block ciphers padding to 16-byte boundaries, that 1-byte signal can fall between two blocks and become invisible - mitigated by adding 1-15 bytes of padding before each guess to push the boundary.
Reading the oracle response. Whatever shape the server speaks, what you ultimately measure is byte count: HTTP responses give you a
Content-Lengthheader (or you countr.content); a netcat-style oracle just emits the ciphertext, countlen(p.recvall()). If the channel is plaintext-on-the-wire and you can sniff it,tcpdump -A -i lo port <n> -w cap.pcapplustshark -r cap.pcap -T fields -e tcp.lengives a tcp-level byte count.Character-by-character recovery: Suppose the flag is
picoCTF{XXXX}. Guess each unknown character:- Send
picoCTF{a+ padding, observe compressed length - Send
picoCTF{b+ padding, observe compressed length - ...repeat for all characters...
- The correct character produces the shortest compressed output
- Step 2Implement the oracle attackWrite a script that connects to the service, sends each candidate prefix, reads the response length, and identifies the shortest one as the correct next character.python
python3 - <<'EOF' from pwn import * import string CANDIDATES = string.printable def query(prefix): p = remote('mercury.picoctf.net', <PORT_FROM_INSTANCE>) p.recvuntil(b'input: ') # or whatever the prompt is p.sendline(prefix.encode()) response = p.recvall(timeout=2) p.close() return len(response) known = 'picoCTF{' while not known.endswith('}'): lengths = {} for c in CANDIDATES: candidate = known + c length = query(candidate) lengths[c] = length log.info(f"Trying {c!r}: length={length}") best = min(lengths, key=lengths.get) known += best log.success(f"Recovered so far: {known}") print(f"Flag: {known}") EOFLearn more
Practical tips for compression oracle attacks:
- Length differences may be small (1 byte or even less if block ciphers round up to block boundaries). Average multiple queries for the same input to reduce noise.
- If a block cipher is used after compression, the output length increases in 16-byte steps. Add padding to your input to push the boundary between blocks, making single-character length differences visible.
- The
zlibcompressor achieves better back-reference compression for longer matches. Starting from a known prefix (likepicoCTF{) that is already in the flag makes subsequent characters easier to detect.
Real-world CRIME defeated HTTPS SPDY compression (2012) and BREACH defeated HTTP body compression (2013). Both were mitigated by disabling response body compression or adding masking (secret mixing into the data stream). For more web-side compression and side-channel patterns, see web challenge bug patterns.
Flag
picoCTF{...}
Compression oracles leak information about shared substrings between attacker input and secrets - when input matches the flag prefix, the compressed output is shorter, enabling character-by-character recovery.