Description

Matryoshka dolls are nested - can you extract the flag from this nested image? Download dolls.jpg.

Setup

Download dolls.jpg.

bash

wget <url>/dolls.jpg

Solution

Walk me through it

Companion reading: Introduction to Steganography Tools covers binwalk alongside zsteg, steghide, and Stegsolve; Linux CLI for CTFs explains the shell-loop idiom used to peel nested archives; Hex Dumps for CTFs shows how to spot the embedded magic bytes manually with xxd.

Step 1Scan for embedded files with binwalk
Run binwalk on dolls.jpg. It will report additional file signatures embedded inside the JPEG - specifically nested PNG files and ZIP archives. Extract everything with the --dd flag.
bash
binwalk dolls.jpg
bash
binwalk --dd='.*' dolls.jpg
Learn more
binwalk scans a binary file for known file format magic bytes (signatures). A JPEG file starts with bytes FF D8 FF; a ZIP file starts with 50 4B 03 04; a PNG with 89 50 4E 47. binwalk identifies these signatures at any offset, revealing files hidden after or within the primary file.
The --dd='.*' flag tells binwalk to extract all recognized signatures to a directory named _dolls.jpg.extracted/. Without this flag, binwalk only reports offsets but does not extract.
How file format magic bytes work: Almost every binary file format begins with a distinctive byte sequence called a magic number. JPEG: FF D8 FF E0. PNG: 89 50 4E 47 0D 0A 1A 0A. ZIP/Office: 50 4B 03 04. PDF: 25 50 44 46 (%PDF). binwalk maintains a database of hundreds of these signatures and scans the entire file at every byte offset for any match. This is why it finds embedded files even when they appear at arbitrary positions inside another file.
Why appended data does not break a JPEG: JPEG parsers read the image from the SOI (Start of Image) marker FF D8 to the EOI (End of Image) marker FF D9. Any bytes after the EOI marker are silently ignored by most image viewers. This means a ZIP archive appended after the JPEG EOI creates a valid JPEG that also contains a valid ZIP - a "polyglot" file. This technique is distinct from LSB steganography: the embedded data is not hidden within the image pixels, it is simply appended as a second file.
Alternative extraction with foremost: foremost is another file carving tool that extracts embedded files by magic number. It may handle some edge cases differently from binwalk. If binwalk's extraction produces a corrupt inner file, try foremost -i dolls.jpg -o output/ as an alternative. For ZIP archives specifically, unzip -l dolls.jpg works directly because the ZIP central directory at the end of the file is found regardless of what precedes it.
Step 2Repeat extraction on each nested image
Navigate into the extracted directory and find the nested image file. Run binwalk --dd='.*' on it. Repeat this process four times total - each image contains another image inside it. After the fourth extraction, you will find flag.txt.
bash
cd _dolls.jpg.extracted/
bash
binwalk --dd='.*' base_images/2_c.jpg
bash
binwalk --dd='.*' base_images/3_c.jpg
bash
binwalk --dd='.*' base_images/4_c.jpg
bash
cat flag.txt
bash
# Or, if each layer is a password-protected zip whose password sits in the previous layer:
bash
while ls *.zip 1>/dev/null 2>&1; do unzip -P "$(cat hint.txt 2>/dev/null)" *.zip && rm *.zip; done
Learn more
This is a steganography-by-polyglot technique: a file that is simultaneously a valid JPEG (the outer image) and a ZIP archive (containing the inner image). Most image viewers display only the JPEG portion and ignore the trailing data. The challenge nests four layers deep, mimicking a Matryoshka (Russian nesting doll).
The base_images/ directory convention comes from binwalk's extraction naming - it creates subdirectories based on the offset where the nested file was found. The final flag.txt appears as a plain text file inside the innermost archive.
Automating repeated extraction: With four nesting layers, running binwalk manually four times is manageable - but for deeper nesting you could script it. A simple Bash loop: while binwalk --dd='.*' *.jpg 2>/dev/null; do cd _*.extracted/; done descends through the layers automatically. Real forensic tools like Autopsy handle recursive extraction internally, tracking provenance of each extracted file back to its source offset.

Flag

picoCTF{...}

binwalk detects file format magic bytes inside any binary - files can be nested arbitrarily and binwalk will find them all.

Matryoshka doll picoCTF 2021 Solution

Description

Solution

Flag

Useful tools for Forensics

Related reading

What to try next

hideme

File types

like1000

Investigative Reversing 4

Bitlocker-1

Mob psycho