Introduction to XML
XML (Extensible Mark-up Language) is used to create user-defined tags, unlike HTML which consists of pre-defined tags. It is designed to describe data and focus on what data is.
In the example below, XML is used to define Email information using tags like, <to>, <from>, <heading>, <body>
<?xml version=’1.0’ encoding=’UTF-8’?>
<note>
<to>John</to>
<from>Bob</from>
<heading>Asking for Leave</heading>
<body>Hello, This is a leave application</body>
</note>
Building blocks of XML
XML elements
They are as building blocks of an XML. Elements behave as containers to hold text. It may contain:
- Other elements
- Attributes
- Entity reference etc.
Example: <title>Hello World</title>
In the above example ‘<title>’ is the element.
XML Attributes
They are additional information about an element.
Example: <IMG src=“abc.jpg”>
In the above example ‘src’ is an attribute of the ‘IMG’ element. ‘src’ provides additional information about element ‘IMG’.
Document Type Definition(DTD)
It describes the structure of the document which contains elements and attributes declarations. The element declaration contains the allowable set of elements that will be used within the document. The attribute declaration contains the allowable set of attributes corresponding to each element.
Syntax:
<!DOCTYPE element DTD identifier [declaration1
declaration2
…….. ]>
There are two types of DTD:
Internal DTD | External DTD |
---|---|
Elements are declared within the XML files inside the <!DOCTYPE> definition | Elements are declared outside the XML files where the <!DOCTYPE> definition contain a reference to the DTD file. |
Below is the Example of an External DTD:
In the Employee.xml file, there is a reference given for External DTD ‘Employee.dtd’.
Entities
Entities are the placeholders for the values that are reserved or already defined.
For example, less than (<) and greater than (>) symbols are reserved for demarking the tags. Imagine that we have the following text within an element: 10<5. XML processor will encounter ‘<’ as the start of an opening tag.
Entities are used to define such special characters which would cause problem for XML processor to understand the characters. Also, it is used to define large blocks of data that need to be repeated throughout the document.
There are four types of Entities:
- Character Entity: It is used to specify any Unicode character in decimal or hexadecimal format.
Example: – ‘A’ Unicode is represented as: – A
- Named Entity: It is used to refer to the entities whose definitions can be found entirely within a document’s DTD.
Syntax: –
<!ENTITY entity-name “entity-value”>
Example: –
<!ENTITY chapter “Positive Thinking.”>
<!ENTITY page “Page No. 118”>
XML Usage: –
<book>&chapter;&page;</book>
- External Entity: It is used represent content of an external file. External entities are useful for creating a common reference that can be shared between multiple documents.
Syntax: –
<!ENTITY entity-name SYSTEM “URI/URL”>
Example: –
<!ENTITY chapter SYSTEM “https://www.example.com/entities.dtd”>
<!ENTITY page SYSTEM “https://www.example.com/entities.dtd”>
XML Example: –
<book>&chapter;&page;</book>
- Parameter Entities: It is used to declare element and attribute declarations as groups and refer to them easily as single entities. It allows us to give a name to collection of elements, attributes or attribute values so that they can be referred directly by the name rather than listing all the members every time they are used.
Example: –
<!ENTITY % person-name “Title,firstname,middlename,lastname”>
In the above example, %person-name stands for all the element components – Title(Mr./Mrs.), Firstname, Middlename, Lastname.
External Entities in Action
Let us understand “External Entities” further using a test application that transmits data using XML as shown below:

Now, let’s introduce an external entity in the request, as shown in the below screenshot:
The external entity here contains path of “abc.txt” file, which is present locally on the server.

The server parses the XML data and retrieves contents from “abc.txt” file as shown below:

In the above demonstration, the following XML code fetched the abc.txt file present on local file system and displayed it to the user of the application.
<!ENTITY xxe SYSTEM “file:///f:/abc.txt” >]>
The SYSTEM keyword used along with external entities causes XML parsers to read data from a URI and permits it to be substituted in the document.
XML EXTERNAL ENTITY ATTACK
As XML External Entity(XXE) provides a provision to declare and use external files, it can be misused by an attacker to: –
- Read local files on the server
- Access internal network
- Execute commands on a remote server
- Read sensitive data and system files on a local machine
Such an attack is called XXE attack. This way any file on the remote server (or more precisely, any file that the web server has read access to) could be obtained.

Fig: Explaining attack scenario of XXE attack.
To exploit XXE, we will now try to access a sensitive file “service_log.txt” present on the server as shown below: –

We can observe the contents of service_log.txt file gets displayed as shown below:

This way XXE can be exploited to retrieve any file information from the server.
In our case, the XXE attack is possible because the XML parsing code written in “aspx.cs” file allows or accepts EXTERNAL ENTITIES.
StreamReader stream = new StreamReader(data);
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
XmlReader xmlReader = XmlReader.Create(stream, settings);
Here, “settings.DtdProcessing = DtdProcessing.Parse;” enables the parser to parse the XML along with its DTD, which leads to this attack. There is no validation being made here to allow external entities only from trusted sources.
Advanced XXE Exploitation
Depending on the way the XML is parsed and used in the applications, different attack scenarios will surface. Some of the insecure cases might lead to attacks like:
- Downloading and Storing malicious content on the server
- Remote code execution etc.
Let’s consider a case where the application parses and stores the XML data on the server.
We can exploit this case by using html file from a remote server containing malicious content as an external entity. This will lead to the XML getting parsed and stored on the server along with the malicious HTML data.
The request sent to the server is shown below:

The server now parses the XML data and stores the data on the server in “out.xml” file as shown below. We can see the file stores the contents of the referenced html file.

When the server later reads such a content stored in its XML file or a database and displays it in other pages of the application, the malicious content will also be rendered to the users.
In our case, the content of the HTML file, as retrieved from the remote site is stored and rendered to the user’s screen. As the file has a JavaScript, it gets executed, as shown below.

This shows that XXE can be exploited to different kind of web attacks.
Vulnerable XML parsing code
Platform | Insecure XML parsing | Secure XML parsing |
---|---|---|
ASP.NET | Below is the code for XmlReader API used in ASP.NET for parsing XML data that allows XXE:
StreamReader stream = new StreamReader(data);
XmlReaderSettings settings = new XmlReaderSettings(); settings.DtdProcessing = DtdProcessing.Parse; XmlReader xmlReader = XmlReader.Create(stream, settings);
Similarly, other APIs are also prone to XXE attack.
|
StreamReader stream = new StreamReader(data);
XmlReaderSettings settings = new XmlReaderSettings(); settings.DtdProcessing = DtdProcessing.Ignore; XmlReader xmlReader = XmlReader.Create(stream, settings); |
PHP | Below is the code for DOMDocument API used in PHP for parsing XML data that allows XXE:
libxml_disable_entity_loader (false);
$postData = utf8_encode(file_get_contents(‘php://input’)); $dom = new DOMDocument();$dom->loadXML($postData, LIBXML_NOENT | LIBXML_DTDLOAD); $items = simplexml_import_dom($dom); |
libxml_disable_entity_loader (true);
$postData = utf8_encode(file_get_contents(‘php://input’)); $dom = new DOMDocument(); $dom->loadXML($postData, LIBXML_NOENT | LIBXML_DTDLOAD); $items = simplexml_import_dom($dom); |
Comments
Post a Comment