Description

While going through FBI servers you find an interesting WAV file. Can you find the flag?

Setup

Download the WAV audio file.

bash

wget <url>/wave.wav

Solution

Want to try it yourself first?

The guided walkthrough reveals hints one step at a time.

Walk me through it

Step 1
Inspect the sample values
Observation
I noticed the WAV file description mentioned finding it on FBI servers with an implication it hides a flag, which suggested the audio samples themselves might encode data rather than represent real sound, and printing the raw values would reveal the encoding pattern.
Open wave.wav in Audacity or load it in Python. Notice that all sample values are positive integers in the thousands - for example 2008, 2506, 1508 - and that the last two digits look like random noise. This is not normal audio; the samples encode data.
python
python3 << 'EOF' from scipy.io import wavfile _, data = wavfile.read("wave.wav") print("First 6 samples:", data[:6].tolist()) # Truncate each sample to its first two digits to strip the noise rounded = [int(str(s)[:2]) for s in data] unique = sorted(set(rounded)) print("Unique rounded values:", unique) print("Count:", len(unique)) EOF
Expected output
```
picoCTF{mU21C_1s_1337_...}
```
What didn't work first
Tried: Open wave.wav in a hex editor and search for the ASCII string 'picoCTF' directly in the file bytes.
The flag is not stored as a raw ASCII string anywhere in the file. It is encoded as quantized amplitude levels across thousands of samples. A hex editor scan finds nothing resembling the flag, because each character of the flag is spread across multiple samples as a level rank, not written literally.
Tried: Use steghide or zsteg to extract hidden data from the WAV file.
steghide and zsteg target LSB (least-significant-bit) steganography embedded in image or audio sample bits. This challenge does not hide data in the LSBs - it encodes data in the most-significant portion of each sample value as one of 16 discrete amplitude levels. Neither tool knows to look for a rank-based encoding scheme, so both report no data found.
Learn more
The raw samples range from roughly 1000 to 8509. The last two digits of each sample are random noise added by the encoder. Chopping them off with int(str(sample)[:2]) leaves exactly 16 distinct two-digit values: 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85. Because there are exactly 16 levels, each one maps to one hexadecimal digit (0 through f). The mapping is by sorted rank: the smallest rounded value (10) represents hex 0, the next (15) represents hex 1, and so on up to 85 representing hex f.
Step 2
Decode the hex string and convert to ASCII
Observation
I noticed the previous step revealed exactly 16 unique rounded sample values, which matched the 16 hexadecimal digits precisely, suggesting each level maps to one hex nibble and the full sample sequence spells out the flag as a hex-encoded ASCII string.
Truncate each raw sample to its first two digits to strip the noise, then map each rounded value to a hex digit by its position in the sorted list of the 16 unique values. Concatenate all hex digits into one string and decode as ASCII to reveal the flag.
python
python3 << 'EOF' from scipy.io import wavfile _, data = wavfile.read("wave.wav") # Strip noise: keep only the first two significant digits rounded = [int(str(s)[:2]) for s in data] # Build sorted list of the 16 unique levels unique = sorted(set(rounded)) # [10, 15, 20, ..., 85] # Map each rounded value to its hex digit by rank (0-15) hex_str = "".join(hex(unique.index(v))[2:] for v in rounded) # Decode hex string to ASCII flag = bytearray.fromhex(hex_str).decode() print(flag) EOF
What didn't work first
Tried: Decode the raw sample values directly as ASCII without stripping the noise digits first.
Raw samples range from about 1000 to 8509, which are all multi-digit integers far outside the printable ASCII range (32-126). Feeding them directly to chr() or bytearray() produces garbage or raises an error. The noise stripping step - taking only the first two digits via int(str(s)[:2]) - is required to collapse the 16 noisy levels back to their clean quantized values before the rank lookup.
Tried: Assume the 16 unique values map directly to ASCII characters instead of hex nibbles, and build the string by looking up chr(unique.index(v)) for each sample.
There are only 16 distinct levels, so mapping them to indices 0-15 gives values in the non-printable control-character range. ASCII flag characters like 'p', 'i', 'c' have values well above 15. The correct approach uses the 16 levels as hex digits (0-f), concatenates them into a hex string, and then decodes the whole hex string to ASCII - two samples per output character, not one.
Learn more
The WAV file encodes data not as audio but as a sequence of quantized amplitude levels. Each sample represents one hex nibble. The encoder wrote the flag as a hex string, turned each hex character into one of 16 amplitude levels (10, 15, ..., 85), and appended two digits of random noise to disguise the pattern. Reversing the process - strip the noise by taking the first two digits, look up the rank in the sorted unique list, convert the rank to a hex character - reconstructs the original hex string. Decoding that hex string as ASCII gives the flag.
This is a covert channel that hides information in the sample values themselves rather than in the audio waveform. The audio sounds bizarre but the discrete staircase pattern in Audacity's waveform view is the telltale sign.

Flag

Reveal flag

picoCTF{mU21C_1s_1337_...}

Raw WAV samples (~1000-8509) encode hex digits as 16 discrete amplitude levels with noise in the last two digits. Strip the noise by taking the first two digits, map each rounded value to its rank in the sorted unique list (0-15), concatenate into a hex string, and decode as ASCII.

Key takeaway

Covert channels encode data in a medium not intended for communication, such as the amplitude levels of an audio file, network packet timing, or unused header fields. The key analytic step is identifying that the observable values cluster into a small discrete set, which signals an intentional encoding scheme rather than natural variation. Recognizing the alphabet size (here 16 levels mapping to hex nibbles) directly reveals the encoding, and the same rank-to-symbol mapping approach applies to any quantized covert channel.

Surfing the Waves picoCTF 2021 Solution

Description

Solution

Flag

Key takeaway

Related reading

Useful tools for Forensics

What to try next

m00nwalk

m00nwalk2

investigation_encoded_1

investigation_encoded_2

waves over lambda

RED