Library http
Implements the HTTP client protocol in a standard form that Nmap scripts can take advantage of.
Because HTTP has so many uses, there are a number of interfaces to this library.
The most obvious and common ones are simply get, post,
and head; or, if more control is required, generic_request
can be used. These functions do what one would expect. The get_url
helper function can be used to parse and retrieve a full URL.
These functions return a table of values, including:
status-line- A string representing the status, such as "HTTP/1.1 200 OK"header- An associative array representing the header. Keys are all lowercase, and standard headers, such as 'date', 'content-length', etc. will typically be present.rawheader- A numbered array of the headers, exactly as the server sent them. While header['content-type'] might be 'text/html', rawheader[3] might be 'Content-type: text/html'.cookies- A numbered array of the cookies the server sent. Each cookie is a table with the following keys:name,value,path,domain, andexpires.body- The full body, as returned by the server.
If a script is planning on making a lot of requests, the pipelining functions can
be helpful. pipeline_add queues requests in a table, and
pipeline performs the requests, returning the results as an array,
with the responses in the same order as the queries were added. As a simple example:
-- Start by defining the 'all' variable as nil
local all = nil
-- Add two 'GET' requests and one 'HEAD' to the queue. These requests are not performed
-- yet. The second parameter represents the 'options' table, which we don't need.
all = http.pipeline_add('/book', nil, all)
all = http.pipeline_add('/test', nil, all)
all = http.pipeline_add('/monkeys', nil, all)
-- Perform all three requests as parallel as Nmap is able to
local results = http.pipeline('nmap.org', 80, all)
At this point, results is an array with three elements. Each element
is a table containing the HTTP result, as discussed above.
One more interface provided by the HTTP library helps scripts determine whether or not
a page exists. The identify_404 function will try several URLs on the
server to determine what the server's 404 pages look like. It will attempt to identify
customized 404 pages that may not return the actual status code 404. If successful,
the function page_exists can then be used to determine whether or not
a page existed.
Some other miscellaneous functions that can come in handy are response_contains,
can_use_head, and save_path. See the appropriate documentation
for them.
The response to each function is typically a table on success or nil on failure. If
a table is returned, the following keys will exist:
status-line: The HTTP status line; for example, "HTTP/1.1 200 OK" (note: this is followed by a newline)
status: The HTTP status value; for example, "200"
header: A table of header values, where the keys are lowercase and the values are exactly what the server sent
rawheader: A list of header values as "name: value" strings, in the exact format and order that the server sent them
cookies: A list of cookies that the server is sending. Each cookie is a table containing the keys name, value, and path. This table can be sent to the server in subsequent responses in the options table to any function (see below).
body: The body of the response
Many of the functions optionally allow an 'options' table. This table can alter the HTTP headers or other values like the timeout. The following are valid values in 'options' (note: not all options will necessarily affect every function):
timeout: A timeout used for socket operations.header: A table containing additional headers to be used for the request. For example,options['header']['Content-Type'] = 'text/xml'content: The content of the message (content-length will be added -- set header['Content-Length'] to override). This can be either a string, which will be directly added as the body of the message, or a table, which will have each key=value pair added (like a normal POST request).cookies: A list of cookies as either a string, which will be directly sent, or a table. If it's a table, the following fields are recognized:
name
** value
** path
auth: A table containing the keysusernameandpassword, which will be used for HTTP Basic authenticationbypass_cache: Do not perform a lookup in the local HTTP cache.no_cache: Do not save the result of this request to the local HTTP cache.no_cache_body: Do not save the body of the response to the local HTTP cache.redirect_ok: Closure that overrides the default redirect_ok used to validate whether to follow HTTP redirects or not. False, if no HTTP redirects should be followed.
redirect_ok = function(host,port)
local c = 5
return function(url)
if ( c==0 ) then return false end
c = c - 1
return true
end
end
Source: http://nmap.org/svn/nselib/http.lua
Script Arguments
http.useragent
The value of the User-Agent header field sent with
requests. By default it is
"Mozilla/5.0 (compatible; Nmap Scripting Engine; http://nmap.org/book/nse.html)".
A value of the empty string disables sending the User-Agent header field.
http-max-cache-size
The maximum memory size (in bytes) of the cache.
http.pipeline
If set, it represents the number of HTTP requests that'll be pipelined (ie, sent in a single request). This can be set low to make debugging easier, or it can be set high to test how a server reacts (its chosen max is ignored).
TODO Implement cache system for http pipelines
Functions
| can_use_head (host, port, result_404, path) |
Determine whether or not the server supports HEAD by requesting / and verifying that it returns 200, and doesn't return data. We implement the check like this because can't always rely on OPTIONS to tell the truth. |
| clean_404 (body) |
Try and remove anything that might change within a 404. For example:
|
| generic_request (host, port, method, path, options) |
Do a single request with a given method. The response is returned as the standard response table (see the module documentation). |
| get (host, port, path, options) |
Fetches a resource with a GET request and returns the result as a table. This is a simple
wraper around
|
| get_status_string (data) |
Take the data returned from a HTTP request and return the status string.
Useful for |
| get_url (u, options) |
Parses a URL and calls |
| head (host, port, path, options) |
Fetches a resource with a HEAD request. Like
|
| identify_404 (host, port) |
Try requesting a non-existent file to determine how the server responds to unknown pages ("404 pages"), which a) tells us what to expect when a non-existent page is requested, and b) tells us if the server will be impossible to scan. If the server responds with a 404 status code, as it is supposed to, then this function simply returns 404. If it contains one of a series of common status codes, including unauthorized, moved, and others, it is returned like a 404. |
| page_exists (data, result_404, known_404, page, displayall) |
Determine whether or not the page that was returned is a 404 page. This is
actually a pretty simple function, but it's best to keep this logic close to
|
| parse_date (s) |
Parses an HTTP date string, in any of the following formats from section 3.3.1 of RFC 2616:
|
| parse_url (url) |
Take a URI or URL in any form and convert it to its component parts. The URL can optionally have a protocol definition ('http://'), a server ('scanme.insecure.org'), a port (':80'), a URI ('/test/file.php'), and a query string ('?username=ron&password=turtle'). At the minimum, a path or protocol and url are required. |
| parse_www_authenticate (s) |
Parses the WWW-Authenticate header as described in RFC 2616, section 14.47
and RFC 2617, section 1.2. The return value is an array of challenges. Each
challenge is a table with the keys |
| pipeline_add (path, options, all_requests, method) |
Adds a pending request to the HTTP pipeline. The HTTP pipeline is a set of requests that will all be sent at the same time, or as close as the server allows. This allows more efficient code, since requests are automatically buffered and sent simultaneously. |
| pipeline_go (host, port, all_requests) |
Performs all queued requests in the all_requests variable (created by the
|
| post (host, port, path, options, ignored, postdata) |
Fetches a resource with a POST request. Like |
| put (host, port, path, options, putdata) |
Uploads a file using the PUT method and returns a result table. This is a simple wrapper
around |
| response_contains (response, pattern, case_sensitive) |
Check if the response variable, which could be a return from a http.get, http.post, http.pipeline, etc, contains the given text. The text can be:
|
| save_path (host, port, path, status, links_to, linked_from, contenttype) |
This function should be called whenever a valid path (a path that doesn't contain a known 404 page) is discovered. It will add the path to the registry in several ways, allowing other scripts to take advantage of it in interesting ways. |
Functions
- can_use_head (host, port, result_404, path)
-
Determine whether or not the server supports HEAD by requesting / and verifying that it returns 200, and doesn't return data. We implement the check like this because can't always rely on OPTIONS to tell the truth.
Note: If
identify_404returns a 200 status, HEAD requests should be disabled. Sometimes, servers use a 200 status code with a message explaining that the page wasn't found. In this case, to actually identify a 404 page, we need the full body that a HEAD request doesn't supply. This is determined automatically if theresult_404field is set.Parameters
- host: The host object.
- port: The port to use.
-
result_404:
[optional] The result when an unknown page is requested.
This is returned by
identify_404. If the 404 page returns a 200 code, then we disable HEAD requests. - path: The path to request; by default, / is used.
Return values:
- A boolean value: true if HEAD is usable, false otherwise.
- If HEAD is usable, the result of the HEAD request is returned (so potentially, a script can avoid an extra call to HEAD
- clean_404 (body)
-
Try and remove anything that might change within a 404. For example:
- A file path (includes URI)
- A time
- A date
- An execution time (numbers in general, really)
The intention is that two 404 pages from different URIs and taken hours apart should, whenever possible, look the same.
During this function, we're likely going to over-trim things. This is fine -- we want enough to match on that it'll a) be unique, and b) have the best chance of not changing. Even if we remove bits and pieces from the file, as long as it isn't a significant amount, it'll remain unique.
One case this doesn't cover is if the server generates a random haiku for the user.
Parameters
- body: The body of the page.
- generic_request (host, port, method, path, options)
-
Do a single request with a given method. The response is returned as the standard response table (see the module documentation).
The
get,head, andpostfunctions are simple wrappers aroundgeneric_request.Any 1XX (informational) responses are discarded.
Parameters
- host: The host to connect to.
- port: The port to connect to.
- method: The method to use; for example, 'GET', 'HEAD', etc.
- path: The path to retrieve.
- options: [optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
Return value:
nilif an error occurs; otherwise, a table as described in the module documentation.See also:
- get (host, port, path, options)
-
Fetches a resource with a GET request and returns the result as a table. This is a simple wraper around
generic_request, with the added benefit of having local caching and support for HTTP redirects. Redirects are followed only if they pass all the validation rules of the redirect_ok function. This function may be overrided by supplying a custom function in theredirect_okfield of the options array. The default function redirects the request if the destination is:- Within the same host or domain
- Has the same port number
- Stays within the current scheme
- Does not exceed
MAX_REDIRECT_COUNTcount of redirects
Caching and redirects can be controlled in the
optionsarray, see module documentation for more information.Parameters
- host: The host to connect to.
- port: The port to connect to.
- path: The path to retrieve.
- options: [optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
Return value:
nilif an error occurs; otherwise, a table as described in the module documentation.See also:
- get_status_string (data)
-
Take the data returned from a HTTP request and return the status string. Useful for
stdnse.print_debugmessages and even advanced output.Parameters
- data: The response table from any HTTP request
Return value:
The best status string we could find: either the actual status string, the status code, or"<unknown status>". - get_url (u, options)
-
Parses a URL and calls
http.getwith the result. The URL can contain all the standard fields, protocol://host:port/pathParameters
- u: The URL of the host.
- options: [optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
Return value:
nilif an error occurs; otherwise, a table as described in the module documentation.See also:
- head (host, port, path, options)
-
Fetches a resource with a HEAD request. Like
get, this is a simple wrapper aroundgeneric_requestwith response caching. This function also has support for HTTP redirects. Redirects are followed only if they pass all the validation rules of the redirect_ok function. This function may be overrided by supplying a custom function in theredirect_okfield of the options array. The default function redirects the request if the destination is:- Within the same host or domain
- Has the same port number
- Stays within the current scheme
- Does not exceed
MAX_REDIRECT_COUNTcount of redirects
Caching and redirects can be controlled in the
optionsarray, see module documentation for more information.Parameters
- host: The host to connect to.
- port: The port to connect to.
- path: The path to retrieve.
- options: [optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
Return value:
nilif an error occurs; otherwise, a table as described in the module documentation.See also:
- identify_404 (host, port)
-
Try requesting a non-existent file to determine how the server responds to unknown pages ("404 pages"), which a) tells us what to expect when a non-existent page is requested, and b) tells us if the server will be impossible to scan. If the server responds with a 404 status code, as it is supposed to, then this function simply returns 404. If it contains one of a series of common status codes, including unauthorized, moved, and others, it is returned like a 404.
I (Ron Bowes) have observed one host that responds differently for three scenarios:
- A non-existent page, all lowercase (a login page)
- A non-existent page, with uppercase (a weird error page that says, "Filesystem is corrupt.")
- A page in a non-existent directory (a login page with different font colours)
As a result, I've devised three different 404 tests, one to check each of these conditions. They all have to match, the tests can proceed; if any of them are different, we can't check 404s properly.
Parameters
- host: The host object.
- port: The port to which we are establishing the connection.
Return values:
- status Did we succeed?
- result If status is false, result is an error message. Otherwise, it's the code to expect (typically, but not necessarily, '404').
- body Body is a hash of the cleaned-up body that can be used when detecting a 404 page that doesn't return a 404 error code.
- page_exists (data, result_404, known_404, page, displayall)
-
Determine whether or not the page that was returned is a 404 page. This is actually a pretty simple function, but it's best to keep this logic close to
identify_404, since they will generally be used together.Parameters
- data: The data returned by the HTTP request
-
result_404:
The status code to expect for non-existent pages. This is returned by
identify_404. -
known_404:
The 404 page itself, if
result_404is 200. Ifresult_404is something else, this parameter is ignored and can be set tonil. This is returned byidentfy_404. - page: The page being requested (used in error messages).
- displayall: [optional] If set to true, don't exclude non-404 errors (such as 500).
Return value:
A boolean value: true if the page appears to exist, and false if it does not. - parse_date (s)
-
Parses an HTTP date string, in any of the following formats from section 3.3.1 of RFC 2616:
- Sun, 06 Nov 1994 08:49:37 GMT (RFC 822, updated by RFC 1123)
- Sunday, 06-Nov-94 08:49:37 GMT (RFC 850, obsoleted by RFC 1036)
- Sun Nov 6 08:49:37 1994 (ANSI C's
asctime()format)
Parameters
- s: the date string.
Return value:
a table with keysyear,month,day,hour,min,sec, andisdst, relative to GMT, suitable for input toos.time. - parse_url (url)
-
Take a URI or URL in any form and convert it to its component parts. The URL can optionally have a protocol definition ('http://'), a server ('scanme.insecure.org'), a port (':80'), a URI ('/test/file.php'), and a query string ('?username=ron&password=turtle'). At the minimum, a path or protocol and url are required.
Parameters
- url: The incoming URL to parse
Return value:
result A table containing the result, which can have the following fields: protocol, hostname, port, uri, querystring. All fields are strings except querystring, which is a table containing name=value pairs. - parse_www_authenticate (s)
-
Parses the WWW-Authenticate header as described in RFC 2616, section 14.47 and RFC 2617, section 1.2. The return value is an array of challenges. Each challenge is a table with the keys
schemeandparams.Parameters
- s: The header value text.
Return value:
An array of challenges, ornilon error. - pipeline_add (path, options, all_requests, method)
-
Adds a pending request to the HTTP pipeline. The HTTP pipeline is a set of requests that will all be sent at the same time, or as close as the server allows. This allows more efficient code, since requests are automatically buffered and sent simultaneously.
The
all_requestsargument contains the current list of queued requests (if this is the first time callingpipeline_add, it should benil). After adding the request to end of the queue, the queue is returned and can be passed to the nextpipeline_addcall.When all requests have been queued, call
pipeline_gowith the all_requests table that has been built.Parameters
- path: The path to retrieve.
- options: [optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
-
all_requests:
[optional] The current pipeline queue (retunred from a previous
add_pipelinecall), or nil if it's the first call. - method: [optional] The HTTP method ('get', 'head', 'post', etc). Default: 'get'.
Return value:
Table with the pipeline get requests (plus this new one)See also:
- pipeline_go (host, port, all_requests)
-
Performs all queued requests in the all_requests variable (created by the
pipeline_addfunction). Returns an array of responses, each of which is a table as defined in the module documentation above.Parameters
- host: The host to connect to.
- port: The port to connect to.
- all_requests: A table with all the previously built pipeline requests
Return value:
A list of responses, in the same order as the requests were queued. Each response is a table as described in the module documentation. - post (host, port, path, options, ignored, postdata)
-
Fetches a resource with a POST request. Like
get, this is a simple wrapper aroundgeneric_requestexcept that postdata is handled properly.Parameters
- host: The host to connect to.
- port: The port to connect to.
- path: The path to retrieve.
- options: [optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
- ignored: Ignored for backwards compatibility.
- postdata: A string or a table of data to be posted. If a table, the keys and values must be strings, and they will be encoded into an application/x-www-form-encoded form submission.
Return value:
nilif an error occurs; otherwise, a table as described in the module documentation.See also:
- put (host, port, path, options, putdata)
-
Uploads a file using the PUT method and returns a result table. This is a simple wrapper around
generic_requestParameters
- host: The host to connect to.
- port: The port to connect to.
- path: The path to retrieve.
- options: [optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
- putdata: The contents of the file to upload
Return value:
nilif an error occurs; otherwise, a table as described in the module documentation.See also:
- response_contains (response, pattern, case_sensitive)
-
Check if the response variable, which could be a return from a http.get, http.post, http.pipeline, etc, contains the given text. The text can be:
- Part of a header ('content-type', 'text/html', '200 OK', etc)
- An entire header ('Content-type: text/html', 'Content-length: 123', etc)
- Part of the body
The search text is treated as a Lua pattern.
Parameters
- response: The full response table from a HTTP request.
- pattern: The pattern we're searching for. Don't forget to escape '-', for example, 'Content%-type'. The pattern can also contain captures, like 'abc(.*)def', which will be returned if successful.
-
case_sensitive:
[optional] Set to
truefor case-sensitive searches. Default: not case sensitive.
Return values:
- result True if the string matched, false otherwise
- matches An array of captures from the match, if any
- save_path (host, port, path, status, links_to, linked_from, contenttype)
-
This function should be called whenever a valid path (a path that doesn't contain a known 404 page) is discovered. It will add the path to the registry in several ways, allowing other scripts to take advantage of it in interesting ways.
Parameters
- host: The host the path was discovered on (not necessarily the host being scanned).
- port: The port the path was discovered on (not necessarily the port being scanned).
- path: The path discovered. Calling this more than once with the same path is okay; it'll update the data as much as possible instead of adding a duplicate entry
- status: [optional] The status code (200, 404, 500, etc). This can be left off if it isn't known.
- links_to: [optional] A table of paths that this page links to.
- linked_from: [optional] A table of paths that link to this page.
- contenttype: [optional] The content-type value for the path, if it's known.




