Description
Matryoshka dolls are nested - can you extract the flag from this nested image? Download dolls.jpg.
Setup
Download dolls.jpg.
wget <url>/dolls.jpgSolution
Want to try it yourself first?
The guided walkthrough reveals hints one step at a time.
xxd.Step 1
Scan for embedded files with binwalkObservationI noticed the challenge description mentioned a 'nested image' and provided a JPEG file, which suggested that data was appended after the JPEG EOI marker and that binwalk, a tool designed to detect embedded file signatures at every byte offset, was the right way to reveal and extract the hidden contents.Run binwalk on dolls.jpg. It will report additional file signatures embedded inside the JPEG - specifically nested PNG files and ZIP archives. Extract everything with the --dd flag.bashbinwalk dolls.jpgbashbinwalk --dd='.*' dolls.jpgExpected output
picoCTF{336cf6d5...}What didn't work first
Tried: Running steghide extract -sf dolls.jpg to pull out the hidden file
steghide works by hiding data within the pixel values of an image using a passphrase-based algorithm. dolls.jpg uses the polyglot technique - data is appended after the JPEG EOI marker as a real ZIP archive, not embedded in pixels. steghide will prompt for a passphrase and then report 'could not extract any data' because there is no steghide payload; binwalk is the right tool for magic-byte-based extraction.
Tried: Running binwalk dolls.jpg without --dd to extract the files
binwalk without --dd only reports the offsets and signatures it finds - it prints lines like '0x... JPEG image data' and '0x... Zip archive data' but writes nothing to disk. You need --dd='.*' (or -e for common types) to actually carve and write the embedded files into a _dolls.jpg.extracted/ directory.
Learn more
binwalk scans a binary file for known file format magic bytes (signatures). A JPEG file starts with bytes
FF D8 FF; a ZIP file starts with50 4B 03 04; a PNG with89 50 4E 47. binwalk identifies these signatures at any offset, revealing files hidden after or within the primary file.The
--dd='.*'flag tells binwalk to extract all recognized signatures to a directory named_dolls.jpg.extracted/. Without this flag, binwalk only reports offsets but does not extract.How file format magic bytes work: Almost every binary file format begins with a distinctive byte sequence called a magic number. JPEG:
FF D8 FF E0. PNG:89 50 4E 47 0D 0A 1A 0A. ZIP/Office:50 4B 03 04. PDF:25 50 44 46(%PDF). binwalk maintains a database of hundreds of these signatures and scans the entire file at every byte offset for any match. This is why it finds embedded files even when they appear at arbitrary positions inside another file.Why appended data does not break a JPEG: JPEG parsers read the image from the SOI (Start of Image) marker
FF D8to the EOI (End of Image) markerFF D9. Any bytes after the EOI marker are silently ignored by most image viewers. This means a ZIP archive appended after the JPEG EOI creates a valid JPEG that also contains a valid ZIP - a "polyglot" file. This technique is distinct from LSB steganography: the embedded data is not hidden within the image pixels, it is simply appended as a second file.Alternative extraction with
foremost:foremostis another file carving tool that extracts embedded files by magic number. It may handle some edge cases differently from binwalk. If binwalk's extraction produces a corrupt inner file, tryforemost -i dolls.jpg -o output/as an alternative. For ZIP archives specifically,unzip -l dolls.jpgworks directly because the ZIP central directory at the end of the file is found regardless of what precedes it.Step 2
Repeat extraction on each nested imageObservationI noticed the challenge is named 'Matryoshka Doll', a direct reference to Russian nesting dolls, which indicated that the first binwalk extraction would reveal another image container rather than the flag, requiring the same extraction process to be repeated on each successive inner image.Navigate into the extracted directory and find the nested image file. Run binwalk --dd='.*' on it. Repeat this process four times total - each image contains another image inside it. After the fourth extraction, you will find flag.txt.bashcd _dolls.jpg.extracted/bashbinwalk --dd='.*' base_images/2_c.jpgbashbinwalk --dd='.*' base_images/3_c.jpgbashbinwalk --dd='.*' base_images/4_c.jpgbashcat flag.txtbash# Or, if each layer is a password-protected zip whose password sits in the previous layer:bashwhile ls *.zip 1>/dev/null 2>&1; do unzip -P "$(cat hint.txt 2>/dev/null)" *.zip && rm *.zip; doneWhat didn't work first
Tried: Running binwalk --dd='.*' on dolls.jpg a second time instead of navigating into _dolls.jpg.extracted/ and running it on the inner image
Re-running binwalk on the original dolls.jpg just re-extracts the same first-level layer into the same output directory, overwriting what was already there. Each nesting layer is a separate file - you must cd into _dolls.jpg.extracted/ (or the equivalent subdirectory), locate the inner JPEG that was extracted, and run binwalk on that new file to peel the next layer.
Tried: Looking for flag.txt directly inside _dolls.jpg.extracted/ after the first binwalk extraction
The challenge nests four layers deep, so flag.txt only appears inside the innermost archive. After the first extraction you find a JPEG (2_c.jpg), not the flag. You have to repeat the binwalk extraction three more times, each time descending into the newly created extracted directory and targeting the next inner image, before flag.txt becomes reachable.
Learn more
This is a steganography-by-polyglot technique: a file that is simultaneously a valid JPEG (the outer image) and a ZIP archive (containing the inner image). Most image viewers display only the JPEG portion and ignore the trailing data. The challenge nests four layers deep, mimicking a Matryoshka (Russian nesting doll).
The
base_images/directory convention comes from binwalk's extraction naming - it creates subdirectories based on the offset where the nested file was found. The finalflag.txtappears as a plain text file inside the innermost archive.Automating repeated extraction: With four nesting layers, running binwalk manually four times is manageable - but for deeper nesting you could script it. A simple Bash loop:
while binwalk --dd='.*' *.jpg 2>/dev/null; do cd _*.extracted/; donedescends through the layers automatically. Real forensic tools like Autopsy handle recursive extraction internally, tracking provenance of each extracted file back to its source offset.
Interactive tools
- StegallDrop any file and Stegall runs every applicable steg technique in parallel: LSB sweeps, bit planes, spectrograms, polyglot carving, metadata, whitespace decode, and a 6-layer base/ROT/XOR/zlib cascade. Recursively unpacks results and surfaces flag matches.
- Hex ViewerView text or raw hex bytes as a xxd-style hex dump with byte offset, hex columns, and ASCII sidebar. Highlights printable characters and null bytes.
- Strings ExtractorPull printable text from any binary, library, or image. ASCII and UTF-16 detection, configurable minimum length, flag-like highlight, no command line needed.
Flag
Reveal flag
picoCTF{336cf6d5...}
binwalk detects file format magic bytes inside any binary - files can be nested arbitrarily and binwalk will find them all.