Library slaxml

This is the NSE implementation of SLAXML. SLAXML is a pure-Lua SAX-like streaming XML parser. It is more robust than many (simpler) pattern-based parsers that exist, properly supporting code like <expr test="5 > 7" />, CDATA nodes, comments, namespaces, and processing instructions. It is currently not a truly valid XML parser, however, as it allows certain XML that is syntactically-invalid (not well-formed) to be parsed without reporting an error. The streaming parser does a simple pass through the input and reports what it sees along the way. You can optionally ignore white-space only text nodes using the stripWhitespace option. The library contains the parser class and the parseDOM function.

Basic Usage of the library:

local parser = parser:new()
parser:parseSAX(xmlbody, {stripWhitespace=true})
To specify custom call backs use :
local call_backs = {
  startElement = function(name,nsURI,nsPrefix)       end, -- When "<foo" or <x:foo is seen
  attribute    = function(name,value,nsURI,nsPrefix) end, -- attribute found on current element
  closeElement = function(name,nsURI)                end, -- When "</foo>" or </x:foo> or "/>" is seen
  text         = function(text)                      end, -- text and CDATA nodes
  comment      = function(content)                   end, -- comments
  pi           = function(target,content)            end, -- processing instructions e.g. "<?yes mon?>"
}
local parser = parser:new(call_backs)
parser:parseSAX(xmlbody)
The code also contains the parseDOM function. To get the dom table use the parseDOM method as follows.
parseDOM(xmlbody, options)

DOM Table Features

Document - the root table returned from the parseDOM() method.

  • doc.type : the string "document"
  • doc.name : the string "#doc"
  • doc.kids : an array table of child processing instructions, the root element, and comment nodes.
  • doc.root : the root element for the document

Element

  • someEl.type : the string "element"
  • someEl.name : the string name of the element (without any namespace prefix)
  • someEl.nsURI : the namespace URI for this element; nil if no namespace is applied
  • someEl.attr : a table of attributes, indexed by name and index

local value = someEl.attr['attribute-name'] : any namespace prefix of the attribute is not part of the name

local someAttr = someEl.attr[1] : an single attribute table (see below); useful for iterating all attributes of an element, or for disambiguating attributes with the same name in different namespaces

  • someEl.kids : an array table of child elements, text nodes, comment nodes, and processing instructions
  • someEl.el : an array table of child elements only
  • someEl.parent : reference to the parent element or document table

Attribute

  • someAttr.type : the string "attribute"
  • someAttr.name : the name of the attribute (without any namespace prefix)
  • someAttr.value : the string value of the attribute (with XML and numeric entities unescaped)
  • someAttr.nsURI : the namespace URI for the attribute; nil if no namespace is applied
  • someAttr.parent : reference to the owning element table

Text - for both CDATA and normal text nodes

  • someText.type : the string "text"
  • someText.name : the string "#text"
  • someText.value : the string content of the text node (with XML and numeric entities unescaped for non-CDATA elements)
  • someText.parent : reference to the parent element table

Comment

  • someComment.type : the string "comment"
  • someComment.name : the string "#comment"
  • someComment.value : the string content of the attribute
  • someComment.parent : reference to the parent element or document table

Processing Instruction

  • someComment.type : the string "pi"
  • someComment.name : the string name of the PI, e.g. <?foo …?> has a name of "foo"
  • someComment.value : the string content of the PI, i.e. everything but the name
  • someComment.parent : reference to the parent element or document table

Authors:

  • Gavin Kistner <original pure lua implemetation>
  • Gyanendra Mishra <NSE specific implementation>

Source: https://svn.nmap.org/nmap/nselib/slaxml.lua

Script Arguments

slaxml.debug

Debug level at which default callbacks will print detailed parsing info. Default: 3

Functions

attribute (name, value, nsURI, nsPrefix)

A call back for attributes. To use define attribute = function(<name>, <attribtute>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an attribute is found.

closeElement (name, nsURI, nsPrefix)

A call back for the end of elements. To use define closeElement = function(<name>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an element closes.

comment (content)

A call back for comments. To use define comment = function(<content>) <function body> end in parser._call table. Executes whenever a comment is encountered.

parseDOM (xml, options)

Parses xml and outputs a dom table.

parseSAX (self, xml, options)

Parses the xml in sax like manner.

pi (target, content)

A call back for processing instructions. To use define pi = function(<target>, <content>) <function body> end in parser._call table. Executes whenever a processing instruction is found.

startElement (name, nsURI, nsPrefix)

A call back for the start of elements. To use define startElement = function(<name>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an element starts.

text (text)

A call back for text content. To use define text = function(<text>) <function body> end in parser._call table. Executes whenever pure text is found.

Functions

attribute (name, value, nsURI, nsPrefix)

A call back for attributes. To use define attribute = function(<name>, <attribtute>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an attribute is found.

Parameters

name
The name of the attribute.
value
The value of the attribute.
nsURI
The name space URI.
nsPrefix
The name space prefix.
closeElement (name, nsURI, nsPrefix)

A call back for the end of elements. To use define closeElement = function(<name>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an element closes.

Parameters

name
The name of the element.
nsURI
The name space URI.
nsPrefix
The name space prefix.
comment (content)

A call back for comments. To use define comment = function(<content>) <function body> end in parser._call table. Executes whenever a comment is encountered.

Parameters

content
The comment body itself.
parseDOM (xml, options)

Parses xml and outputs a dom table.

Parameters

xml
the xml body to be parsed.
options
if any to use. Supports stripWhitespaces currently.
parseSAX (self, xml, options)

Parses the xml in sax like manner.

Parameters

self
 
xml
The xml body to be parsed.
options
Options if any specified.
pi (target, content)

A call back for processing instructions. To use define pi = function(<target>, <content>) <function body> end in parser._call table. Executes whenever a processing instruction is found.

Parameters

target
the PI target
content
any value not containing the sequence '?>'
startElement (name, nsURI, nsPrefix)

A call back for the start of elements. To use define startElement = function(<name>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an element starts.

Parameters

name
The name of the element.
nsURI
The name space URI.
nsPrefix
The name space prefix.
text (text)

A call back for text content. To use define text = function(<text>) <function body> end in parser._call table. Executes whenever pure text is found.

Parameters

text
The actual text.