WhitePages picoCTF 2019 Solution

Published: April 2, 2026

Description

I stopped using color in my terminal. Decode the binary in the whitepages file.

Download the file.

bash
wget <url>/whitepages.txt

Solution

Want to try it yourself first?

The guided walkthrough reveals hints one step at a time.

Walk me through it
  1. Step 1
    Examine the file with xxd
    Observation
    I noticed the challenge description said the file was named 'whitepages' and contained only whitespace, which suggested the data might be hidden in the byte values of different-looking-but-distinct whitespace characters rather than any visible text.
    The file appears to contain only whitespace. Run xxd to see the actual hex values of each byte. You will find two different whitespace characters being used - for example, regular space (0x20) and a special Unicode whitespace character.
    bash
    xxd whitepages.txt | head -20

    Expected output

    00000000: 2020 20e2 8083 20e2 8083 2020 e280 8320    .....  ....   
    00000010: e280 8320 20e2 8083 2020 e280 8320 e280  ...   ...  .. ..
    00000020: 8320 20e2 8083 20e2 8083 e280 8320 20e2  . ... ...  .  .
    ...
    What didn't work first

    Tried: Open whitepages.txt in a text editor or cat it to the terminal to look for hidden content.

    Both approaches display whitespace characters as blank space, so the file appears completely empty. Terminals and editors collapse or ignore the distinction between U+0020 and U+2003. Only a hex dump tool like xxd reveals the actual byte values, making the two-character binary alphabet visible.

    Tried: Run strings whitepages.txt hoping to extract embedded printable text directly.

    The strings tool filters for sequences of at least 4 consecutive printable ASCII bytes. Although regular space (0x20) is printable ASCII, each space byte is isolated by surrounding 3-byte em-space sequences (E2 80 83), which are not printable ASCII. No run of 4 or more consecutive printable ASCII bytes exists, so strings produces no output. The data is encoded as a binary sequence in the whitespace itself, not stored as raw ASCII.

    Learn more

    Unicode contains many whitespace characters beyond the regular space (U+0020). Common ones used in steganography challenges include: em space (U+2003, UTF-8: E2 80 83), en space (U+2002), thin space (U+2009), and others. All look identical in most text editors.

  2. Step 2
    Map whitespace characters to binary bits
    Observation
    I noticed the xxd output showed exactly two distinct byte sequences (0x20 and the three-byte UTF-8 sequence E2 80 83 for em-space), which suggested treating them as a two-symbol binary alphabet and grouping every 8 symbols into a byte to recover ASCII text.
    Identify the two different whitespace byte sequences. Assign one to bit 0 and the other to bit 1. Group every 8 bits into a byte and convert to ASCII.
    python
    python3 << 'EOF'
    with open('whitepages.txt', 'rb') as f:
        data = f.read()
    
    # Identify the two whitespace types from xxd output
    # e.g., 0x20 = space = 1, 0xe2 0x80 0x83 = em-space = 0
    bits = ''
    i = 0
    while i < len(data):
        if data[i:i+3] == b'\xe2\x80\x83':
            bits += '0'
            i += 3
        elif data[i] == 0x20:
            bits += '1'
            i += 1
        else:
            i += 1
    
    # Convert bits to ASCII
    result = ''
    for j in range(0, len(bits) - 7, 8):
        byte = int(bits[j:j+8], 2)
        result += chr(byte)
    print(result)
    EOF
    What didn't work first

    Tried: Swap the bit assignments so that em-space maps to 1 and regular space maps to 0, then decode the result.

    Reversing the bit mapping produces a garbled sequence of non-printable bytes rather than readable ASCII. The correct assignment must be determined by inspecting which byte sequence appears more frequently (the dominant character is typically 0) or by trying both assignments and checking which produces a valid ASCII string. Without confirming the mapping against actual output, the decoded bytes will be wrong.

    Tried: Use the SNOW steganography tool (stegsnow) to decode the file, since SNOW is a known whitespace steganography tool.

    SNOW encodes data using only trailing tabs and spaces at the end of lines, and it requires a specific file format with newlines. This challenge encodes data using Unicode em-space versus regular space throughout the body of the file, which is a different scheme entirely. Running stegsnow on this file produces no output or an error because the encoding format does not match what SNOW expects.

    Learn more

    This technique is called whitespace steganography. The SNOW tool and the Whitespace programming language both use similar concepts of encoding information in invisible characters. It is effective against casual inspection but immediately visible under hex analysis.

    The key insight is that while the text looks blank, it encodes a binary string where each character of the message is represented by 8 bits of whitespace characters.

Interactive tools
  • StegallDrop any file and Stegall runs every applicable steg technique in parallel: LSB sweeps, bit planes, spectrograms, polyglot carving, metadata, whitespace decode, and a 6-layer base/ROT/XOR/zlib cascade. Recursively unpacks results and surfaces flag matches.
  • Hex ViewerView text or raw hex bytes as a xxd-style hex dump with byte offset, hex columns, and ASCII sidebar. Highlights printable characters and null bytes.
  • Strings ExtractorPull printable text from any binary, library, or image. ASCII and UTF-16 detection, configurable minimum length, flag-like highlight, no command line needed.

Flag

Reveal flag

picoCTF{not_all_spaces_are_created_equal_...}

Per-instance flag. Multiple hash suffixes confirmed across instances (c167040c..., f71be4d2..., etc.). Prefix picoCTF{not_all_spaces_are_created_equal_} is consistent.

Key takeaway

Unicode defines hundreds of whitespace code points that render identically to ordinary space in any text editor or terminal but differ at the byte level. A file containing only two distinct invisible characters is a binary channel: assign one symbol to 0 and the other to 1, group into 8-bit bytes, and recover the hidden message. The same trick appears in real-world covert communication research, e-book watermarking, and source-code exfiltration where tabs and spaces are used interchangeably.

Related reading

Want more picoCTF 2019 writeups?

Useful tools for Forensics

What to try next