XML Output (-oX)

XML, the extensible markup language, has its share of critics as well as plenty of zealous proponents. I was long in the former group, and only grudgingly incorporated XML into Nmap after volunteers performed most of the work. Since then, I have learned to appreciate the power and flexibility that XML offers, and even wrote this book in the DocBook XML format. I strongly recommend that programmers interact with Nmap through the XML interface rather than trying to parse the normal, interactive, or grepable output. The XML format includes more information than the others and is extensible enough that new features can be added without breaking existing programs that use it. It can be parsed by standard XML parsers, which are available for all popular programming languages, usually for free. Editors, validators, transformation systems, and many other applications already know how to handle the format. Normal and interactive output, on the other hand, are custom to Nmap and subject to regular changes as I strive for a clearer presentation to end users. Grepable output is also Nmap-specific and tougher to extend than XML. It is considered deprecated, and many Nmap features such as MAC address detection are not presented in this output format.

An example of Nmap XML output is shown in Example 13.9. Whitespace has been adjusted for readability. In this case, XML was sent to stdout thanks to the -oX - construct. Some programs executing Nmap opt to read the output that way, while others specify that output be sent to a filename and then they read that file after Nmap completes.

Example 13.9. An example of Nmap XML output
# nmap -T4 -A -p 1-1000 -oX - scanme.nmap.org
<?xml version="1.0"?>
<?xml-stylesheet href="file:///usr/local/bin/../share/nmap/nmap.xsl" type="text/xsl"?>
<!-- Nmap 5.59BETA3 scan initiated Fri Sep  9 18:33:41 2011 as:
     nmap -T4 -A -p 1-1000 -oX - scanme.nmap.org -->
<nmaprun scanner="nmap" args="nmap -T4 -A -p 1-1000 -oX - scanme.nmap.org" start="1315618421"
         startstr="Fri Sep  9 18:33:41 2011" version="5.59BETA3" xmloutputversion="1.03">
 <scaninfo type="syn" protocol="tcp" numservices="1000" services="1-1000"/>
 <verbose level="0"/>
 <debugging level="0"/>
 <host starttime="1315618421" endtime="1315618434">
  <status state="up" reason="echo-reply"/>
  <address addr="74.207.244.221" addrtype="ipv4"/>
  <hostnames>
   <hostname name="scanme.nmap.org" type="user"/>
   <hostname name="li86-221.members.linode.com" type="PTR"/>
  </hostnames>
  <ports>
   <extraports state="closed" count="997">
    <extrareasons reason="resets" count="997"/>
   </extraports>
   <port protocol="tcp" portid="22">
    <state state="open" reason="syn-ack" reason_ttl="53"/>
    <service name="ssh" product="OpenSSH" version="5.3p1 Debian 3ubuntu7"
             extrainfo="protocol 2.0" ostype="Linux" method="probed" conf="10">
     <cpe>cpe:/a:openbsd:openssh:5.3p1</cpe>
     <cpe>cpe:/o:linux:kernel</cpe>
    </service>
    <script id="ssh-hostkey"
            output="1024 8d:60:f1:7c:ca:b7:3d:0a:d6:67:54:9d:69:d9:b9:dd (DSA)&#xa;
                    2048 79:f8:09:ac:d4:e2:32:42:10:49:d3:bd:20:82:85:ec (RSA)"/>
   </port>
   <port protocol="tcp" portid="80">
    <state state="open" reason="syn-ack" reason_ttl="53"/>
    <service name="http" product="Apache httpd" version="2.2.14"
             extrainfo="(Ubuntu)" method="probed" conf="10">
     <cpe>cpe:/a:apache:http_server:2.2.14</cpe>
    </service>
    <script id="http-title" output="Go ahead and ScanMe!"/>
   </port>
  </ports>
  <os>
   <portused state="open" proto="tcp" portid="22"/>
   <portused state="closed" proto="tcp" portid="1"/>
   <portused state="closed" proto="udp" portid="31289"/>
   <osclass type="general purpose" vendor="Linux" osfamily="Linux"
            osgen="2.6.X" accuracy="100">
    <cpe>cpe:/o:linux:linux_kernel:2.6.39</cpe>
   </osclass>
   <osmatch name="Linux 2.6.39" accuracy="100" line="39278"/>
  </os>
  <uptime seconds="23450" lastboot="Fri Sep  9 12:03:04 2011"/>
  <distance value="11"/>
  <tcpsequence index="199" difficulty="Good luck!"
               values="49018209,48C3EBED,495A2E7F,493EF30C,48ED43B3,495A9B0C"/>
  <ipidsequence class="All zeros" values="0,0,0,0,0,0"/>
  <tcptssequence class="1000HZ"
                 values="165CC09,165CC6E,165CCD2,165CD36,165CD9A,165CE48"/>
  <trace port="256" proto="tcp">
   <!-- Several hop elements removed for brevity -->
   <hop ttl="9" ipaddr="72.52.92.109" rtt="15.69" host="10gigabitethernet1-1.core1.fmt1.he.net"/>
   <hop ttl="10" ipaddr="64.62.250.6" rtt="12.06" host="linode-llc.10gigabitethernet2-3.core1.fmt1.he.net"/>
   <hop ttl="11" ipaddr="74.207.244.221" rtt="16.55" host="li86-221.members.linode.com"/>
  </trace>
  <times srtt="26517" rttvar="19989" to="106473"/>
 </host>
 <runstats>
  <finished time="1315618434" timestr="Fri Sep  9 18:33:54 2011" elapsed="13.66"
            summary="Nmap done at Fri Sep  9 18:33:54 2011; 1 IP address (1 host up)
                     scanned in 13.66 seconds" exit="success"/>
  <hosts up="1" down="0" total="1"/>
 </runstats>
</nmaprun>

Another advantage of XML is that its verbose nature makes it easier to read and understand than other formats. Readers familiar with Nmap in general can likely understand most of the XML output in Example 13.9, “An example of Nmap XML output” without further documentation. The grepable output format, on the other hand, is tough to decipher without its own reference guide.

There are a few aspects of the example XML output which may not be self-explanatory. For example, look at the two port elements in Example 13.10

Example 13.10. Nmap XML port elements
<port protocol="tcp" portid="22">
 <state state="open" reason="syn-ack" reason_ttl="56"/>
 <service name="ssh" product="OpenSSH" version="4.3" extrainfo="protocol 2.0"
          method="probed" conf="10"/>
 <script id="ssh-hostkey"
       output="1024 60:ac:4d:51:b1:cd:85:09:12:16:92:76:1d:5d:27:6e (DSA)&#xa;
               2048 2c:22:75:60:4b:c3:3b:18:a2:97:2c:96:7e:28:dc:dd (RSA)"/>
</port>
<port protocol="tcp" portid="113">
 <state state="closed" reason="reset" reason_ttl="56"/>
 <service name="auth" method="table" conf="3"/>
</port>

The port protocol, ID (port number), state, and service name are the same as would be shown in the interactive output port table. The service product, version, and extrainfo attributes come from version detection and are combined together into one field of the interactive output port table. The method and conf attributes aren't present in any other output types. The method can be table, meaning the service name was simply looked up in nmap-services based on the port number and protocol, or it can be probed, meaning that it was determined through the version detection system. The conf attribute measures the confidence Nmap has that the service name is correct. The values range from one (least confident) to ten. Nmap only has a confidence level of 3 for ports determined by table lookup, while it is highly confident (level 10) that port 22 of Example 13.10, “Nmap XML port elements” is OpenSSH, because Nmap connected to the port and found an SSH server identifying as OpenSSH.

One other aspect that some users find confusing is that the attributes /nmaprun/@start and /nmaprun/runstats/finished/@time hold timestamps given in Unix time, the number of seconds since January 1, 1970. This is often easier for programs to handle. For the convenience of human readers, versions 3.78 and newer include the equivalent calendar time written out in the attributes /nmaprun/@startstr and /nmaprun/runstats/finished/@endstr.

The original command line (argv array) is stored in the attribute /nmaprun/@args. Arguments are separated by whitespace. Arguments that originally contained whitespace are enclosed in double quotes (which appear as &quot; in the XML). Individual characters can also be escaped with backslashes within quoted strings.

Nmap includes a document type definition (DTD) which allows XML parsers to validate Nmap XML output. While it is primarily intended for programmatic use, it can also help humans interpret Nmap XML output. The DTD defines the legal elements of the format, and often enumerates the attributes and values they can take on. It is reproduced in Appendix A, Nmap XML Output DTD.

Using XML Output

The Nmap XML format can be used in many powerful ways, though few users actually take any advantage of it. I believe this is due to inexperience of many users with XML, combined with a lack of practical, solution-oriented documentation on using the Nmap XML format. This chapter provides several practical examples, including the section called “Manipulating XML Output with Perl”, the section called “Output to a Database”, and the section called “Creating HTML Reports”.

A key advantage of XML is that you do not need to write your own parser as you do for specialized Nmap output types such as grepable and interactive output. Any general XML parser should do.

Nmap XML output can of course be viewed in any text editor or XML editor. Some spreadsheet programs, including Microsoft Excel, are able to import Nmap XML data directly for viewing. These general-purpose XML processors share the limitation that they treat Nmap XML generically, just like any other XML file. They don't understand the relative importance of elements, nor how to organize the data for a more useful presentation. The use of specialized XML processors that make sense of Nmap XML output is the subject of the following sections.