Description
Decode this substitution cipher. Connect to the server to get the ciphertext.
Setup
Connect to the challenge server.
nc <HOST> <PORT_FROM_INSTANCE>Solution
Walk me through it- Step 1Get the ciphertextConnect with nc. The server prints a paragraph of ciphertext encrypted with a simple substitution cipher (each letter is replaced by a different letter consistently throughout the text). Copy the entire ciphertext.
Learn more
Why the keyspace size is misleading. A monoalphabetic substitution cipher maps each plaintext letter to a unique cipher letter. The total keyspace is
26! ≈ 4 * 10^26- far too large to brute force. But the cipher leaks two structural properties: (1) letter equality is preserved - identical plaintext letters always map to the same ciphertext letter, and (2) letter frequency is preserved- if 'e' is 12.7% of plaintext, then whatever letter 'e' maps to is 12.7% of ciphertext.English letter frequencies (long text).
E 12.7% T 9.1% A 8.2% O 7.5% I 7.0% N 6.7% S 6.3% H 6.1% R 6.0% D 4.3% L 4.0% C 2.8% U 2.8% M 2.4% W 2.4% F 2.2% G 2.0% Y 2.0% P 1.9% B 1.5% V 1.0% K 0.8% J 0.15% X 0.15% Q 0.10% Z 0.07%Index of Coincidence (IoC). A statistical measure: probability that two randomly chosen letters from the text are equal. For English, IoC ~= 0.0667; for uniform random text (26 equally likely letters), IoC = 1/26 ~= 0.0385. A monoalphabetic substitution preserves IoC at the English value (since frequencies are merely relabeled). A polyalphabetic cipher (Vigenere) drops IoC toward 0.0385 because it flattens the distribution. Computing IoC of the ciphertext is the first sanity check that this is a simple substitution.
Bigram and trigram analysis sharpen the attack. The most common English bigrams:
TH HE IN ER AN RE. Trigrams:THE AND ING HER. The single most common word:THE(~3% of all words). Once you guesse -> Xandt -> Yfrom frequency, look at three-letter words starting withYand ending withX: those are likelyTHE, fixing the third letter ash -> ?in one shot. - Step 2Perform frequency analysisCount letter frequencies in the ciphertext. The most frequent cipher letter is probably E, the second most frequent is probably T, and so on. Build a substitution key based on these correspondences.python
python3 << 'EOF' from collections import Counter ciphertext = """PASTE CIPHERTEXT HERE""" ct_letters = [c.lower() for c in ciphertext if c.isalpha()] freq = Counter(ct_letters).most_common() print("Cipher letter frequencies:") for letter, count in freq: print(f" {letter}: {count}") print("\nExpected English order: e t a o i n s h r l d c u m f p g w y b v k x j q z") EOFLearn more
Crib bootstrapping. A crib is a piece of plaintext you know lives in the ciphertext. The flag format
picoCTF{...}guarantees the seven lettersp, i, c, o, C, T, Fappear inside the braces. Find any{...}pattern in the ciphertext - the seven letters inside are exactly that mapping (case is sometimes preserved, sometimes flattened). Lock those substitutions, propagate them throughout the text, and the partial decryption guides the rest.Hill-climbing solver (quipqiup style). Start with a random key. Compute a fitness score: log-likelihood of the partial decryption under English n-gram probabilities (typically quadgram statistics from a corpus). Repeatedly swap two letters in the key, accept the swap if fitness improves, occasionally accept a worse swap (simulated annealing) to escape local optima. After ~5000-10000 iterations the key converges to the real one. quipqiup.com runs this in your browser; expect a clean answer in under 10 seconds for a paragraph of text.
fitness(text) = sum over each 4-letter window: log P(quadgram | English) P(THER | English) ~ 10^-2 P(QXJZ | English) ~ 10^-9 <- garbage - Step 3Decode the flagApply your discovered substitution key to the entire ciphertext. The flag text will appear. Look for the picoCTF{...} pattern.
Learn more
Frequency analysis was first described by Arab mathematician Al-Kindi in the 9th century. It remained the primary cryptanalytic technique for nearly 1000 years, until the invention of polyalphabetic ciphers (Vigenere) made it less effective.
Alternate Solution
Paste the ciphertext into the Frequency Analysis tool on this site to instantly see the letter frequency distribution and start mapping cipher letters to English. For a substitution cipher with word spacing preserved, quipqiup.com can also auto-solve it using a dictionary-based hill-climbing algorithm.
Flag
picoCTF{...}
Frequency analysis on the ciphertext maps cipher letters to English letters - the flag appears in the decrypted text.