File types picoCTF 2022 Solution

Published: July 20, 2023

Description

The provided PDF is actually a shell archive containing multiple nested formats (ar, cpio, bzip2, gzip, lzip, etc.). Extract them sequentially until you reach ASCII text, then hex-decode the contents.

Run the file as a shell archive (sh Flag.pdf) to extract flag.

Inspect each resulting file with file and use the appropriate extractor (ar, cpio, bzip2, gzip, lzip, lz4, lzma, lzop, xz, etc.).

Once the final ASCII file appears, hex-decode it with xxd -r -p.

bash
sh Flag.pdf
bash
ar x flag
bash
cpio --file flag.cpio -i
bash
bzip2 flag -d
bash
gunzip flag.gz
bash
lzip flag -d
bash
unlz4 flag.lz4
bash
lzma flag.lzma -d
bash
lzop flag.lzop -d
bash
unxz flag.xz
bash
xxd -r -p flag
  1. Step 1Peel each layer
    After each extraction, run file flag to identify the next compression/container type and use the matching extractor.
    bash
    # Peel one layer at a time. After EACH extraction, re-run 'file flag'
    # to see the next layer, then use the matching extractor below.
    # The trick: rename 'flag' to the extension the tool expects; the
    # decompressor strips it and writes 'flag' back, ready for the next pass.
    file flag                                   # identify the current layer, then run ONE of:
    
    ar x flag && rm -f flag                     # "ar archive"   -> extracts the inner member
    cpio -idu < flag                            # "cpio archive" -> extracts the inner file
    mv flag flag.bz2  && bunzip2 flag.bz2       # "bzip2 compressed"
    mv flag flag.gz   && gunzip  flag.gz        # "gzip compressed"
    mv flag flag.xz   && unxz    flag.xz        # "XZ compressed"
    mv flag flag.lzma && unlzma  flag.lzma      # "LZMA compressed"
    mv flag flag.lz   && lzip -d flag.lz        # "lzip compressed"
    mv flag flag.lz4  && unlz4 -f flag.lz4 flag # "LZ4 compressed"
    mv flag flag.lzo  && lzop -d flag.lzo       # "lzop compressed"
    
    # Repeat until 'file flag' reports ASCII text, then hex-decode (next step).
    Learn more

    The file command reads magic bytes - the first few bytes of the file - to identify the format regardless of extension. file works because every container has a unique signature in its header. Quick reference:

    Magic bytes (first 2-6 bytes):
    
      gzip       1f 8b 08
      bzip2      42 5a 68            ("BZh")
      xz         fd 37 7a 58 5a 00   ("\xfdxz" + "\0")
      lzma       5d 00 00            (no good universal magic)
      lz4        04 22 4d 18
      cpio       30 37 30 37 30      (ASCII "07070")
      ar         21 3c 61 72 63 68   ("!<arch>")
      zip / jar  50 4b 03 04         ("PK\x03\x04")
      PNG        89 50 4e 47         ("\x89PNG")
      ELF        7f 45 4c 46         ("\x7fELF")

    Why ship the chain inside a shell archive (shar) wearing a PDF extension? Layered misdirection. PDF makes you expect a binary you'd open in a viewer, not a script you'd execute. shar predates tar - it's a self-extracting shell script that recreates files via inline commands. The format is harmless in itself; the trick is that file Flag.pdf identifies it as a shell script regardless of extension.

    Entropy heuristic. Compressed/encrypted data has high entropy (~7.99/8.0 bits per byte); plaintext is around 4-5. ent flag or python3 -c "import collections, math; b=open('flag','rb').read(); print(-sum((c/len(b))*math.log2(c/len(b)) for c in collections.Counter(b).values()))" gives you a quick check: if entropy stays high, you're still wrapped; if it drops, you've hit text or hex.

    More CLI recipes for this kind of file archaeology in Linux CLI for CTF.

  2. Step 2Decode the hex
    The final file is ASCII hex; xxd -r -p converts it back to readable bytes. Verify the output looks like a flag before submitting.
    bash
    head -c 64 flag
    bash
    xxd -r -p flag
    bash
    xxd -r -p flag | head -c 200
    Learn more

    Hex round-trip example. If cat flag shows 706963 6f4354 467b66 316c65..., then xxd -r -p flag emits the bytes p i c o C T F { f 1 l e .... -r reverses (read hex, write bytes), -p selects "plain" format (just the hex digits, no offsets, no ASCII sidebar). That's the same format Python's bytes.hex() produces, so it round-trips cleanly.

    Sanity-check the result: it should start with picoCTF{. If it's still binary garbage, you missed an extraction layer. Run file on the "hex" input first; if file says it's still a compressed format, you stopped peeling too early.

    The general lesson: in forensics, extension is a hint and magic bytes are the truth. file and the magic-byte table above let you correctly identify any container regardless of how an attacker labelled it. Hex-dump fundamentals in Hex Dumps for CTF.

Flag

picoCTF{f1len@m3_m@n1pul@t10n_f0r_0b2cur17y_3c7...}

Automating the extraction loop with `while file flag | grep ...` can save time on nested compression challenges.

Want more picoCTF 2022 writeups?

Tools used in this challenge

Related reading

What to try next