Blogs (Finding XML Vulnerabilities in Code) • Hackstack Security

Introduction to XML

In case you're not already aware, XML (Extensible Markup Language) is a markup language similar to HTML, but without predefined tags to use. Instead, you define your own tags designed specifically for your needs. This is a powerful way to store data in a format that can be stored, searched, and shared.

~ Mozilla

In short, XML was only designed to store and transport data. But this won't stop us from learning more about this markup language and finding different ways to exploit it. Since XML is quite popularly used over the internet, we need to understand the different vulnerabilities that could arise in code - if it is not properly configured/coded.

This code review series aims to do exactly that - finding vulnerabilities in code and improving your security code review skills as a beginner. Excited? Let's jump straight into it!

Common XML Vulnerabilities in the wild

Now that we've understood what XML is, we jump to the part for which everyone clicked on this article for:

Finding different XML vulnerabilities in code during a security code review!

If you see that the source code which you are reviewing, is using an XML parser, it is important to review the implementation of this XML parser to make sure it is safe from XML vulnerabilities.

What is an XML parser you ask?

An XML parser is a software library/package that provides an interface for client applications to work with an XML document. The XML Parser is designed to read the XML and create a way for programs to use XML in their code.

XML parsers validate the document and check that the document is well formatted.

There are different types of XML parsers available for different programming languages for developers to choose from. Below are a few XML parsers in C++:

TinyXML
PugiXML
libxml++
xerces-C++

We will be working with xerces-c++ library(XercesDOMParser) in this blog post.

The following are a few well-known vulnerabilities found in implementations of XML that are not configured properly:

XXE (XML External Entities) attack
Billion Laughs Attack
Quadratic Blowup Attack

We will discuss what these vulnerabilities are (in short) and how to find them during a code review assessment as well as mitigations for these vulnerabilities.

Uncovering these vulnerabilities

We've covered the basics of XML, what XML parsers are, and their types. We also saw the different vulnerabilities that could exist in code where an XML parser is used. Now it's time to know more about these vulnerabilities and how we can find them during a security code review assessment.

XXE (XML External Entities) Attack

An XML External Entity attack is a type of attack against an application that parses XML input. This attack occurs when XML input containing a reference to an external entity is processed by a weakly configured XML parser.

External entities are particularly interesting from a security perspective because they allow an entity to be defined based on the contents of a file path or URL.

This attack may lead to the disclosure of confidential data, denial of service, server-side request forgery, port scanning from the perspective of the machine where the parser is located, and other system impacts.

Example of an XXE attack to disclose the '/etc/passwd' file(any file can be disclosed):

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
    <!ELEMENT foo ANY >
    <!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<foo>&xxe;</foo>

If you are not aware of what an XXE vulnerability is, I highly recommend you get yourself familiar with the XXE vulnerability. This post focuses more on finding these issues in code. PortSwigger provides a good idea about this vulnerability and what XML Entities are.

Assuming we know more about XXE vulnerabilities, we will now learn to find this vulnerability in our C++ code.

As mentioned above, we will be using Apache xerces-c++'s XercesDOMParser in this post for demonstration. We will look at code snippets talking about “Insecure” configuration of the XML parser (which will give rise to XXE vulnerability) as well as a “Secure” configuration to avoid these vulnerabilities.

Insecure Code:

#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/util/XMLUni.hpp>
								

XercesDOMParser* createInsecureParser() {

									  XercesDOMParser* parser = new XercesDOMParser();

									  parser->setValidationScheme(XercesDOMParser::Val_Auto);

									  return parser;

									}

									

int main() {

									  XercesDOMParser* insecureParser = createInsecureParser();

									  insecureParser->parse("insecure.xml");

									  // Process the XML document

									  delete insecureParser;

									  return 0;

									}

The above is an insecure implementation of the XercesDOMParser which will give rise to XXE vulnerabilities. The above program will read and understand information from an XML file.

In this code, the program is set up to understand everything in the XML file, including some special instructions called entities(as we had discussed earlier).

Unfortunately, this openness can be risky. If the XML file contains certain types of special instructions(that refer to external sources), it might lead to security issues. This is like opening the door wide without checking who is knocking. This will lead to a hacker crafting special requests with malicious external entities to fetch confidential data or perform malicious actions as discussed above.

Let's look at how to avoid XXE vulnerability with just a couple of flags (in xerces-c++).

Secure Code:

#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/util/XMLUni.hpp>


									XercesDOMParser* createSecureParser() {

									  XercesDOMParser* parser = new XercesDOMParser();

									
									  // Set secure flags to prevent XXE vulnerability

									  parser->setCreateEntityReferenceNodes(true);

									  parser->setDisableDefaultEntityResolution(true);

									
									  parser->setValidationScheme(XercesDOMParser::Val_Auto);

									  return parser;

									}

									

int main() {

									  XercesDOMParser* secureParser = createSecureParser();

									  secureParser->parse("secure.xml");

									  // Process the XML document

									  delete secureParser;

									  return 0;

									}

Now, picture the same program as above, but this time, we've added some safety measures. We told the program to be more careful when reading the XML file by adding the below two flags:

// Set secure flags to prevent XXE vulnerability
parser->setCreateEntityReferenceNodes(true);
parser->setDisableDefaultEntityResolution(true);

setCreateEntityReferenceNodes(true) - This method allows the user to specify whether the parser should create entity reference nodes in the DOM tree being produced. When the flag is true, the parser will create EntityReference nodes in the DOM tree. The EntityReference nodes and their child nodes will be read-only. When the flag is false, no EntityReference nodes will be created. This flag needs to be set to “true” to avoid XXE.

setDisableDefaultEntityResolution(true) - This method gives users the option to not perform default entity resolution. If the user's resolveEntity method returns NULL the parser will try to resolve the entity on its own. When this option is set to true, the parser will not attempt to resolve the entity when the resolveEntity method returns NULL. Again, this flag needs to be set to “true” to avoid XXE.

Ideally, the safest way to prevent XXE is always to disable DTDs (External Entities) completely. Depending on the parser or use case, it may or may not be possible.

Disabling DTDs also makes the parser secure against denial of services (DOS) attacks such as Billion Laughs. If it is not possible to disable DTDs completely, then external entities and external document type declarations must be disabled in a way that's specific to each parser.

With the above checks, we're telling the program to check the identity of whoever is knocking on the door before letting them in. This way, we reduce the risk of potential security problems that could arise from malicious instructions in the XML file.

In conclusion, understanding and mitigating XXE vulnerabilities is crucial for ensuring the security of your web applications. If you're concerned about the presence of XXE in your application or want to enhance your overall security posture, remember that we at HackStack Security specialize in identifying and addressing such issues.

Don't hesitate to get in touch with us today - safeguard your applications and data with our expert assistance.

Finding XML Vulnerabilities in Code!

Introduction to XML

Common XML Vulnerabilities in the wild

What is an XML parser you ask?

Uncovering these vulnerabilities

XXE (XML External Entities) Attack

Insecure Code:

Secure Code:

Muqsit Baig

Muqsit Baig

Muqsit Baig

Subscribe to our newsletter