Library slaxml
This is the NSE implementation of SLAXML.
SLAXML is a pure-Lua SAX-like streaming XML parser. It is more robust
than many (simpler) pattern-based parsers that exist, properly supporting
code like <expr test="5 > 7" />
, CDATA nodes, comments,
namespaces, and processing instructions.
It is currently not a truly valid XML parser, however, as it allows certain XML that is
syntactically-invalid (not well-formed) to be parsed without reporting an error.
The streaming parser does a simple pass through the input and reports what it sees along the way.
You can optionally ignore white-space only text nodes using the stripWhitespace
option.
The library contains the parser class and the parseDOM function.
Basic Usage of the library:
local parser = parser:new() parser:parseSAX(xmlbody, {stripWhitespace=true})To specify custom call backs use :
local call_backs = { startElement = function(name,nsURI,nsPrefix) end, -- When "<foo" or <x:foo is seen attribute = function(name,value,nsURI,nsPrefix) end, -- attribute found on current element closeElement = function(name,nsURI) end, -- When "</foo>" or </x:foo> or "/>" is seen text = function(text) end, -- text and CDATA nodes comment = function(content) end, -- comments pi = function(target,content) end, -- processing instructions e.g. "<?yes mon?>" } local parser = parser:new(call_backs) parser:parseSAX(xmlbody)The code also contains the
parseDOM
function.
To get the dom table use the parseDOM
method as follows.
parseDOM(xmlbody, options)
DOM Table Features
Document - the root table returned from the parseDOM() method.
doc.type
: the string "document"doc.name
: the string "#doc"doc.kids
: an array table of child processing instructions, the root element, and comment nodes.doc.root
: the root element for the document
Element
someEl.type
: the string "element"someEl.name
: the string name of the element (without any namespace prefix)someEl.nsURI
: the namespace URI for this element; nil if no namespace is appliedsomeEl.attr
: a table of attributes, indexed by name and index
local value = someEl.attr['attribute-name']
: any namespace prefix of the attribute is not part of the name
local someAttr = someEl.attr[1]
: an single attribute table (see below); useful for iterating all
attributes of an element, or for disambiguating attributes with the same name in different namespaces
someEl.kids
: an array table of child elements, text nodes, comment nodes, and processing instructionssomeEl.el
: an array table of child elements onlysomeEl.parent
: reference to the parent element or document table
Attribute
someAttr.type
: the string "attribute"someAttr.name
: the name of the attribute (without any namespace prefix)someAttr.value
: the string value of the attribute (with XML and numeric entities unescaped)someAttr.nsURI
: the namespace URI for the attribute; nil if no namespace is appliedsomeAttr.parent
: reference to the owning element table
Text - for both CDATA and normal text nodes
someText.type
: the string "text"someText.name
: the string "#text"someText.value
: the string content of the text node (with XML and numeric entities unescaped for non-CDATA elements)someText.parent
: reference to the parent element table
Comment
someComment.type
: the string "comment"someComment.name
: the string "#comment"someComment.value
: the string content of the attributesomeComment.parent
: reference to the parent element or document table
Processing Instruction
someComment.type
: the string "pi"someComment.name
: the string name of the PI, e.g. <?foo …?> has a name of "foo"someComment.value
: the string content of the PI, i.e. everything but the namesomeComment.parent
: reference to the parent element or document table
Authors:
Source: https://svn.nmap.org/nmap/nselib/slaxml.lua
Script Arguments
- slaxml.debug
Debug level at which default callbacks will print detailed parsing info. Default: 3
Functions
- attribute (name, value, nsURI, nsPrefix)
A call back for attributes. To use define attribute = function(<name>, <attribtute>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an attribute is found.
- closeElement (name, nsURI, nsPrefix)
A call back for the end of elements. To use define closeElement = function(<name>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an element closes.
- comment (content)
A call back for comments. To use define comment = function(<content>) <function body> end in parser._call table. Executes whenever a comment is encountered.
- parseDOM (xml, options)
Parses xml and outputs a dom table.
- parseSAX (self, xml, options)
Parses the xml in sax like manner.
- pi (target, content)
A call back for processing instructions. To use define pi = function(<target>, <content>) <function body> end in parser._call table. Executes whenever a processing instruction is found.
- startElement (name, nsURI, nsPrefix)
A call back for the start of elements. To use define startElement = function(<name>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an element starts.
- text (text)
A call back for text content. To use define text = function(<text>) <function body> end in parser._call table. Executes whenever pure text is found.
Functions
- attribute (name, value, nsURI, nsPrefix)
-
A call back for attributes. To use define attribute = function(<name>, <attribtute>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an attribute is found.
Parameters
- name
- The name of the attribute.
- value
- The value of the attribute.
- nsURI
- The name space URI.
- nsPrefix
- The name space prefix.
- closeElement (name, nsURI, nsPrefix)
-
A call back for the end of elements. To use define closeElement = function(<name>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an element closes.
Parameters
- name
- The name of the element.
- nsURI
- The name space URI.
- nsPrefix
- The name space prefix.
- comment (content)
-
A call back for comments. To use define comment = function(<content>) <function body> end in parser._call table. Executes whenever a comment is encountered.
Parameters
- content
- The comment body itself.
- parseDOM (xml, options)
-
Parses xml and outputs a dom table.
Parameters
- xml
- the xml body to be parsed.
- options
- if any to use. Supports
stripWhitespaces
currently.
- parseSAX (self, xml, options)
-
Parses the xml in sax like manner.
Parameters
- self
- xml
- The xml body to be parsed.
- options
- Options if any specified.
- pi (target, content)
-
A call back for processing instructions. To use define pi = function(<target>, <content>) <function body> end in parser._call table. Executes whenever a processing instruction is found.
Parameters
- target
- the PI target
- content
- any value not containing the sequence '?>'
- startElement (name, nsURI, nsPrefix)
-
A call back for the start of elements. To use define startElement = function(<name>, <nsURI>, <nsPrefix>) <function body> end in parser._call table. Executes whenever an element starts.
Parameters
- name
- The name of the element.
- nsURI
- The name space URI.
- nsPrefix
- The name space prefix.
- text (text)
-
A call back for text content. To use define text = function(<text>) <function body> end in parser._call table. Executes whenever pure text is found.
Parameters
- text
- The actual text.