Listen for the keystream
Open the source for picoctf-2026-shift-registers and the first thing you see is a comment saying the author just learned about Linear Feedback Shift Registers (LFSRs) in school. The seed is eight bits. The flag is XORed with the LFSR output. That is the entire challenge.
The first time I saw an LFSR in a CTF I went straight to Wikipedia, read about Galois fields and tap polynomials, and lost forty minutes. Then I noticed the seed was 8 bits. There are 256 of those. The whole challenge collapses into a for seed in range(256) loop.
That keeps happening. A challenge looks like serious cryptography, the source mentions GF(2) or polynomial division or some 1990s cipher you have never heard of, and the actual attack is one of three things: the key is too short, the keystream got reused, or the cipher is linear and one matrix solve recovers everything. Stream ciphers in CTFs lose for these three reasons, almost always, across years of picoCTF.
A stream cipher tells you what is wrong with it. The ciphertext is the diagnostic.
This guide is the unifying frame. AES for CTF covered block ciphers and the bugs that come from misusing modes. This is the companion for the other half of CTF crypto: stream ciphers, classical keystream constructions, and everything that is not AES. I will use the term "stream cipher" loosely throughout, the same way picoCTF authors use it. Vigenere is technically a polyalphabetic substitution; OTP is technically not a cipher at all when used correctly. They behave like stream ciphers when they fail, which is the only state we see them in here.
You will leave with three working attacks, a fingerprint table that picks the right one in under a minute, and a clean mental model that connects an 8-bit LFSR, a 6-character Vigenere key, and a reused one-time pad as the same shape of bug.
Stream ciphers in one minute
A stream cipher is a function that turns a short key into an arbitrarily long pseudorandom byte sequence (the keystream), then XORs the keystream with the plaintext to produce the ciphertext.
ciphertext = plaintext XOR keystreamplaintext = ciphertext XOR keystream
That is the whole shape. Every stream cipher in picoCTF is some variation on "here is how I produce the keystream." Ciphers differ in how complicated that keystream generator is, how much state it has, and how it eats the key. The attack always reduces to the same question: can I recover the keystream from the ciphertext alone?
Three things make that easy:
- Short key. If the key is shorter than the message, it has to repeat. Repetition leaks structure.
- Reused keystream. If the same keystream encrypts two messages, XORing the ciphertexts cancels the keystream and leaves a XOR of two plaintexts, which is solvable.
- Linear keystream generator. If the function that makes the keystream is linear over GF(2), a tiny number of output bits recovers the entire generator.
A CTF stream cipher almost always trips at least one of these. Often two. The sections below are organised around which one trips.
Single-byte XOR: the floor
Before the multi-byte attacks, the floor: single-byte XOR. The whole keystream is one byte, repeated. There are 256 of them. Try them all.
from itertools import cyclect = bytes.fromhex("1c0111001f010100061a024b53535009181c")for k in range(256):pt = bytes(c ^ k for c in ct)if pt.startswith(b"picoCTF{") or b"the" in pt:print(k, pt)
That snippet solves a meaningful slice of every CTF's easy-crypto bracket. It is the same engine as Cryptopals set 1 challenge 3 and you will find some flavour of it in picoctf-2019-vault-door-5 and dozens of beginner crypto sets. The reason it matters here is not that the attack is hard; it is that every other attack in this article is a generalisation of it. Repeating-key XOR is single-byte XOR run multiple times in parallel. LFSR cracking is single-byte XOR with a smarter way of guessing the byte. Two-time-pad recovery is single-byte XOR with the alphabet replaced by "letters that produce real English when XORed against your candidate."
Hold onto this floor. The rest of the article is teaching you what to do when there is more than one byte to guess.
Repeating-key XOR and Vigenere: short key, long message
Vigenere is the prototype of every stream-cipher-with-a-short-key bug. The key is a word, say SOLVECRYPTO. The plaintext is a paragraph. To encrypt, you walk the plaintext one character at a time and shift each letter by the next key character's position in the alphabet, looping the key when it runs out. Repeating-key XOR is the same scheme with bytes and XOR instead of letters and addition.
The cipher is a stream cipher. The keystream is the key, repeated forever. The bug is that the key is shorter than the message. Once you know how short, you can attack each repeating column independently with the single-byte technique above. The hard part is figuring out the key length.
Step 1: find the key length
Two classical tricks. Kasiski examination: look for repeated trigrams or longer substrings in the ciphertext. If the same plaintext bigram falls under the same part of the key twice, it produces the same ciphertext. The distance between matching ciphertext substrings is a multiple of the key length. Take the GCD of those distances.
Index of Coincidence (IoC): the probability that two random characters from the text are equal. For English plaintext, IoC is about 0.0667. For uniform random bytes, IoC is about 1/26 = 0.0385. A Vigenere ciphertext with the wrong key length looks more uniform; with the right key length, splitting the ciphertext into L columns and computing the IoC of each column gives you values close to the English IoC. Sweep L from 2 to 20, plot the average column IoC, look for the spike.
def ioc(text):n = len(text)if n < 2:return 0.0counts = {}for c in text:counts[c] = counts.get(c, 0) + 1return sum(v * (v - 1) for v in counts.values()) / (n * (n - 1))def best_keylen(ct, max_len=20):letters = [c for c in ct.lower() if c.isalpha()]scores = []for L in range(1, max_len + 1):cols = [letters[i::L] for i in range(L)]avg = sum(ioc(col) for col in cols) / Lscores.append((L, avg))return sorted(scores, key=lambda x: -x[1])[:5]
Step 2: solve each column
Once L is locked, every L-th character was encrypted under the same key character. That column is a Caesar cipher. Run frequency analysis on it: the most common letter in English is e, so the most common letter in the column probably maps to e, fixing the shift for that column. Do that for all L columns and you have the key. Apply, recover plaintext, find the flag.
For raw-bytes repeating-key XOR (no alphabetic structure), replace "most common letter is e" with a chi-square score against expected English byte frequencies. Try each of 256 key bytes for the column, score, keep the best. This is Cryptopals set 1 challenge 6 line for line.
picoCTF receipts
You will see this attack on:
- picoctf-2022-vigenere : explicit Vigenere with a key. dcode.fr or any Kasiski tool ends it in seconds.
- picoctf-2019-easy1 : a Vigenere-style tabula recta lookup with the key handed to you. The lesson is structural, not cryptanalytic.
- picoctf-2019-java-script-kiddie : repeating-key XOR with the key generated client-side. Read the JS, recover the key.
- picoctf-2024-custom-encryption : a homebrew XOR-and-multiply cipher that reduces to repeating-key XOR after the multiplication is undone.
Vigenere's real lesson is that the cipher dies the moment the key is shorter than the message. That is a much wider category than "Vigenere". Any cipher with a periodic keystream falls to this same one-two: find the period, attack each phase independently. Keep that in your head; the LFSR section is a different way of saying the same thing.
picoCTF{...} embedded somewhere, which is structure enough to bootstrap.LFSR: linear feedback, free keystream
Now the centerpiece. A Linear Feedback Shift Register is a register of n bits where each clock tick shifts everything one position and the new bit at one end is computed as an XOR of selected positions in the register. Those selected positions are the tap polynomial. Output the bit that fell off the other end. That is one keystream bit. Repeat.
Picture a 4-bit LFSR with taps at positions 0 and 3, starting state 0001:
tick state output0 0001 11 1000 02 0100 03 0010 04 1001 15 1100 06 1110 07 1111 18 0111 19 1011 110 0101 111 1010 012 1101 113 0110 014 0011 115 0001 1 <- back to start
That 4-bit register cycles through all 15 non-zero states before repeating, which is the maximum-length sequence (m-sequence) for n=4. With n=8, the cycle is 255. Pick the right tap polynomial and you get a sequence that looks random, has nice statistical properties, runs in two XOR gates of hardware, and was used as the basis of real ciphers for decades.
It is also completely broken once you know how to look at it.
Why LFSRs lose: linearity
Every output bit is a linear combination of the initial state bits over GF(2). That means: collect 2n output bits, and you have 2n linear equations in n unknowns (the initial state). Solve the system. Done. The Berlekamp-Massey algorithm is the elegant version of this; for small n you do not even need it.
For shift-registers specifically, with n=8 and a known plaintext prefix, the brute force is shorter than the algebra:
from itertools import islicedef lfsr_keystream(seed, taps, n_bits):state = seedfor _ in range(n_bits):bit = state & 1yield bit# XOR the tap positions for the new high bitnew_bit = 0for t in taps:new_bit ^= (state >> t) & 1state = (state >> 1) | (new_bit << 7)ct = bytes.fromhex(open('output.txt').read().strip())TAPS = [0, 2, 3, 4] # read these out of chall.pyfor seed in range(1, 256): # seed 0 is degeneratebits = list(islice(lfsr_keystream(seed, TAPS, len(ct) * 8), len(ct) * 8))ks = bytes(sum(b << i for i, b in enumerate(bits[8*j:8*j+8]))for j in range(len(ct)))pt = bytes(c ^ k for c, k in zip(ct, ks))if pt.startswith(b'picoCTF{'):print(seed, pt)break
That is the whole solution to picoctf-2026-shift-registers. Read the taps and the bit-output convention out of the provided chall.py (the bit-pack endianness is the only fiddly part), drop them in, run, get the flag in microseconds. The full challenge writeup walks through it step by step.
Berlekamp-Massey: the algebraic version
When the state size is bigger than 8 bits you cannot brute-force, but you also do not have to. The Berlekamp-Massey algorithm recovers the minimum-length LFSR (both the tap polynomial and the initial state) from 2n consecutive keystream bits in O(n^2) time. You do not have to know the taps. You do not have to know the state size. You feed it bits and it tells you both.
A clean Python implementation, courtesy of the Mollard reference collapsed to the essentials:
def berlekamp_massey(s):n = len(s)C, B = [1] + [0] * n, [1] + [0] * nL, m, b = 0, 1, 1for N in range(n):d = s[N]for i in range(1, L + 1):d ^= C[i] & s[N - i]if d == 1:T = C[:]for i in range(n + 1 - m):C[i + m] ^= B[i]if 2 * L <= N:L, B, m = N + 1 - L, T, 1else:m += 1else:m += 1return L, C[:L + 1]# Returns (degree, polynomial coefficients).# Feed it a list of bits; you need at least 2 * L of them.
To use it on a real challenge: extract enough keystream bits (XOR a known-plaintext prefix against the ciphertext), feed them in, get back the polynomial and length. Now you have the generator. Run it forward to recover the rest of the keystream and decrypt.
For a known-plaintext prefix, eight bytes of picoCTF{ gives you 64 keystream bits. That covers any LFSR up to about n=32 with one-shot recovery and gives plenty of margin for 8- to 16-bit registers. If the challenge does not give you a known plaintext prefix, you can still attack the cipher: guess the first few bytes are ASCII (high bit zero), use that to constrain the keystream, and Berlekamp-Massey will tell you whether your guess was self-consistent.
Real LFSR-based ciphers and why they died
A pure LFSR is too weak to ship. Real designs combine multiple LFSRs through a nonlinear function. A5/1, the GSM stream cipher, uses three LFSRs of lengths 19, 22, and 23, with irregular clocking driven by majority logic. E0, the Bluetooth stream cipher, uses four LFSRs combined through a finite state machine. Both were broken anyway: A5/1 in 2003 by Barkan, Biham, and Keller using ciphertext-only attacks plus the GSM error-correcting redundancy, and E0 by Lu, Meier, and Vaudenay in 2004 with a related-key attack.
The pattern is: linearity is too useful to give up entirely (it is fast, cheap, and well-understood), so designers paste a nonlinear filter on top and hope. Cryptanalysts then find that the filter has its own structure, peel it, and the underlying LFSRs spill out. This is mostly historical now; ChaCha20 ships in TLS, and A5/1 has been replaced by A5/3 (KASUMI) and A5/4 (SNOW 3G). But the historical lesson is the one CTFs teach: do not roll your own anything that uses an LFSR as a primary mixing element.
picoCTF receipts
- picoctf-2026-shift-registers : 8-bit LFSR, taps in the source, 256-seed brute force solves it. The marquee challenge for this section.
- picoctf-2019-seed-spring : not an LFSR, but a closely related "the keystream comes from a predictable PRNG" challenge. Same shape: small-state generator, predictable output, exhaustive search over the seed.
Keystream reuse: two ciphertexts, one stream
The cleanest way to break a stream cipher is for the author to break it for you. Reusing a keystream across two messages does that. The math is one line:
c1 = p1 XOR kc2 = p2 XOR kc1 XOR c2 = p1 XOR p2 # k cancels
You now have the XOR of two unknown plaintexts. If both plaintexts are English (or any structured language), you can recover both. The trick is called crib dragging: guess a string that might appear in p1, XOR it across c1 XOR c2 at every position, and look for output that resembles English. When it does, you have found both the position of your guess in p1 and the corresponding fragment of p2 at that same position.
The classic crib is the literal string " the " (with surrounding spaces). It is short, frequent, and produces obvious garbage when wrong and obvious English when right. Once you find " the " in p1 at offset 17, you also know p2[17:22]. Extend the crib in both directions one character at a time. Often you can read off both messages by pure visual inspection in 10 minutes.
def drag(c1c2, crib):crib = crib.encode()for i in range(len(c1c2) - len(crib) + 1):window = c1c2[i:i + len(crib)]out = bytes(a ^ b for a, b in zip(window, crib))if all(32 <= b < 127 for b in out):print(f"offset {i}: {out}")c1 = bytes.fromhex("...")c2 = bytes.fromhex("...")c1c2 = bytes(a ^ b for a, b in zip(c1, c2))drag(c1c2, " the ")drag(c1c2, " and ")drag(c1c2, "picoCTF{")
That last crib, picoCTF{, is the gift CTF authors keep giving. If either plaintext is a flag, dragging the flag prefix across c1 XOR c2 reveals where the flag sits in one ciphertext, and the corresponding bytes in the other ciphertext fall out for free.
picoCTF receipts
- picoctf-2021-easy-peasy : the canonical reused-OTP challenge. The server uses a 40000-byte key as a one-time pad and tracks an offset. Send 40000 bytes of known plaintext to roll the offset back to zero, then ask the server to encrypt the encrypted flag, which now uses the same keystream the flag was encrypted with. XOR cancels the keystream and the original plaintext appears.
- picoctf-2020-mini-competition-otp-implementation : same pattern in a different wrapper. The takeaway is identical: an OTP that is reused stops being a one-time pad.
Two-time pad is the bug that taught the discipline why "one-time" is in the name. The famous VENONA project decrypted Soviet diplomatic cables in the 1940s and 1950s by exploiting exactly this: the Soviets reused some of their one-time pad pages, and Allied cryptanalysts spent decades reading the traffic. Stream cipher reuse remains the highest-value bug class in deployed crypto. Microsoft Office's Word 97 implementation reused RC4 keystreams across document versions; researchers were able to recover edits and revisions for years before it was fixed.
Fingerprint the cipher in 30 seconds
When a new crypto challenge drops, the first 30 seconds is fingerprinting. The ciphertext tells you almost everything if you know what to look at. This table is the cheat sheet I wish someone had handed me on day one.
| Cipher family | What ciphertext looks like | The tell | Attack | Receipt |
|---|---|---|---|---|
| Caesar / ROT-N | A-Z text, same length as plaintext, word boundaries preserved | IoC near 0.067; 26 candidates | Try all 26 shifts; flag is in one of them | caesar, 13 |
| Substitution | A-Z text, IoC near 0.067, but no shift solves it | Letter frequency mismatched, structure preserved | Frequency analysis; quipqiup.com or hill-climbing | waves-over-lambda, la-cifra-de |
| Vigenere / repeating-key XOR | A-Z text or random bytes; IoC drops | IoC of full text near 0.04; IoC of every L-th char near 0.067 for some L | Kasiski + IoC for L; per-column frequency for the key | vigenere, custom-encryption |
| Single-byte XOR | Random-looking bytes, often hex; same length as plaintext | High entropy; 256-key brute force trivial | Try all 256 bytes; score with English frequency | vault-door-5 |
| LFSR | Random bytes; source provided with shift register code | State size N is small (8-32); known taps | Brute force seed (small N) or Berlekamp-Massey (any N) | shift-registers |
| Predictable PRNG keystream | Random bytes; source uses random.seed() or time() | Seed is small or predictable (epoch second, PID, etc.) | Brute force the seed; replay the PRNG | seed-spring |
| Two-time pad / reused OTP | Two ciphertexts of similar length, same key offset | Same length, same source, hint that key is reused | XOR ciphertexts; crib drag | easy-peasy, otp-implementation |
| Block cipher (ECB/CBC/CTR) | Bytes in fixed-size blocks (16 for AES); ECB shows repeated blocks | Length is multiple of block size; structural patterns at block boundaries | Mode-specific (ECB cut-paste, CBC bit-flip, CTR nonce reuse) | AES for CTF |
Two minutes with this table replaces an hour of poking. Compute the IoC. Look at the source. Count the bytes. Decide which row you are in. Then read the corresponding section.
ioc(), chi_square_english(), a single-byte XOR brute-forcer, a Kasiski search, and the crib-drag from this article. Drop it into ~/.ctf/cipher_kit.pyand import it the moment a crypto challenge appears. Half my crypto solves are now "run the kit, look at the output, write five lines to finish."Where real stream ciphers diverge
The ciphers in this article all lose because they were built to lose. CTF authors choose them because the failure mode is teachable. Real stream ciphers (RC4, ChaCha20, Salsa20) are different in three concrete ways, and the differences are exactly what the broken ones lack.
- Big internal state. RC4 has a 256-byte permutation. ChaCha20 has a 16-word state. There is no exhaustive search over the state, and Berlekamp-Massey does not apply because the state-update is not linear over GF(2).
- Nonlinear update.ChaCha20's round function uses 32-bit additions, XORs, and rotations (the "ARX" primitive set). Additions are not linear over GF(2); they introduce carry. That single decision moves the cipher out of reach of the LFSR-style algebraic attacks.
- Mandatory nonce. ChaCha20 takes a key and a 96-bit nonce. The keystream is a function of both. The nonce is meant to vary; if it does, the keystream never repeats and the two-time-pad attack does not apply. The cipher also fails immediately if the nonce repeats, which is why TLS, WireGuard, and modern protocols all bake nonce uniqueness into the design rather than trusting the user.
You will rarely break ChaCha20 in a CTF. You will sometimes break a ChaCha20 challenge because someone hardcoded the nonce or used a counter that resets. The bug is then a two-time-pad bug, which lives in the previous section. The cipher itself is fine.
One more, briefly. RC4 is the famous-but-deprecated stream cipher used in WEP, early TLS, and a thousand legacy protocols. RC4's keystream has well-known biases: the first 256 bytes leak information about the key, certain byte positions are non-uniform, and related-key attacks broke WEP entirely. RC4 NOMORE recovered HTTP cookies from RC4-protected TLS sessions in 75 hours. RC4 is not present in current TLS. If a CTF puts RC4 in your hands, the standard move is "drop the first 1024 keystream bytes" (the SkipFirst-N defense from the early 2000s) and treat the rest as a generic stream cipher.
The line that separates broken stream ciphers from secure ones is not big or mysterious. It is "state large enough to make exhaustive search impossible, update function nonlinear over GF(2), and a nonce that the protocol enforces uniqueness on." CTF stream-cipher challenges are almost always missing two of those three. Once you see which two, the attack is on the shelf.
Where to go next
The shape of every stream-cipher challenge in this guide is the same: the keystream is shorter, reused, or linear, and the ciphertext shows you which. Three attacks (Kasiski plus per-column frequency for repeating keys, Berlekamp-Massey or seed brute force for linear generators, crib dragging for reused keystreams) cover every receipt I have linked above. They cover most picoCTF stream-cipher challenges going back to 2019.
What this guide deliberately did not cover: AES modes and the bugs that come from misusing them, RSA attacks for the public-key side of the room, hash cracking when the "cipher" turns out to be a hash, and layered encodings when the challenge is base64-inside-hex-inside-ROT13 wearing a cipher costume. Each of those is its own animal. The sibling posts walk through them.
If you only take one thing from this article, take this: when you next open a crypto challenge, do not start with the algebra. Start with the ciphertext. Compute the IoC. Look at the source. Count the bytes. The cipher will tell you what it is. Listen.
Now go solve shift-registers.
Sources and further reading
- Cryptopals Crypto Challenges, Set 1 : single-byte XOR, repeating-key XOR, and the canonical Kasiski-plus-IoC pipeline. Every CTF cryptographer should solve this set once.
- Berlekamp-Massey algorithm (Wikipedia) : the cleanest free reference, with pseudocode and worked examples.
- Barkan, Biham, and Keller, "Instant Ciphertext-Only Cryptanalysis of GSM Encrypted Communication" (CRYPTO 2003) : the A5/1 break. Worth reading once for the historical picture.
- VENONA project (Wikipedia) : the highest-stakes two-time-pad bug in history.
- RC4 NOMORE : practical attack on RC4-in-TLS, the work that finally retired RC4 from web crypto.
- Index of Coincidence (Wikipedia) : Friedman's 1922 paper, still the cleanest exposition.
- quipqiup.com : browser-based hill-climbing solver for monoalphabetic substitutions. Saves time when the cipher is clearly substitution and the key is not in the source.