Library http

Implements the HTTP client protocol in a standard form that Nmap scripts can take advantage of.

Because HTTP has so many uses, there are a number of interfaces to this library.

The most obvious and common ones are simply get, post, and head; or, if more control is required, generic_request can be used. These functions take host and port as their main parameters and they do what one would expect. The get_url helper function can be used to parse and retrieve a full URL.

HTTPS support is transparent. The library uses comm.tryssl to determine whether SSL is required for a request.

These functions return a table of values, including:

  • status-line - A string representing the status, such as "HTTP/1.1 200 OK", followed by a newline. In case of an error, a description will be provided in this line.
  • status - The HTTP status value; for example, "200". If an error occurs during a request, then this value is going to be nil.
  • version - HTTP protocol version string, as stated in the status line. Example: "1.1"
  • header - An associative array representing the header. Keys are all lowercase, and standard headers, such as 'date', 'content-length', etc. will typically be present.
  • rawheader - A numbered array of the headers, exactly as the server sent them. While header['content-type'] might be 'text/html', rawheader[3] might be 'Content-type: text/html'.
  • cookies - A numbered array of the cookies the server sent. Each cookie is a table with the expected keys, such as name, value, path, domain, and expires. This table can be sent to the server in subsequent responses in the options table to any function (see below).
  • rawbody - The full body, as returned by the server. Chunked transfer encoding is handled transparently.
  • body - The full body, after processing the Content-Encoding header, if any. The Content-Encoding and Content-Length headers are adjusted to stay consistent with the processed body.
  • incomplete - Partially received response object, in case of an error.
  • truncated - A flag to indicate that the body has been truncated
  • decoded - A list of processed named content encodings (like "identity" or "gzip")
  • undecoded - A list of named content encodings that could not be processed (due to lack of support or the body being corrupted for a given encoding). A body has been successfully decoded if this list is empty (or nil, if no encodings were used in the first place).
  • location - A numbered array of the locations of redirects that were followed.

Many of the functions optionally allow an "options" input table, which can modify the HTTP request or its processing in many ways like adding headers or setting the timeout. The following are valid keys in "options" (note: not all options will necessarily affect every function):

  • timeout: A timeout used for socket operations.
  • header: A table containing additional headers to be used for the request. For example, options['header']['Content-Type'] = 'text/xml'
  • content: The content of the message. This can be either a string, which will be directly added as the body of the message, or a table, which will have each key=value pair added (like a normal POST request). (A corresponding Content-Length header will be added automatically. Set header['Content-Length'] to override it).
  • cookies: A list of cookies as either a string, which will be directly sent, or a table. If it's a table, the following fields are recognized: name, value and path. Only name and value fields are required.
  • auth: A table containing the keys username and password, which will be used for HTTP Basic authentication. If a server requires HTTP Digest authentication, then there must also be a key digest, with value true. If a server requires NTLM authentication, then there must also be a key ntlm, with value true.
  • bypass_cache: Do not perform a lookup in the local HTTP cache.
  • no_cache: Do not save the result of this request to the local HTTP cache.
  • no_cache_body: Do not save the body of the response to the local HTTP cache.
  • max_body_size: Limit the received body to specific number of bytes. Overrides script argument http.max-body-size. See the script argument for details.
  • truncated_ok: Do not treat oversized body as error. Overrides script argument http.truncated-ok.
  • any_af: Allow connecting to any address family, inet or inet6. By default, these functions will only use the same AF as nmap.address_family to resolve names. (This option is a straight pass-thru to comm.lua functions.)
  • redirect_ok: Closure that overrides the default redirect_ok used to validate whether to follow HTTP redirects or not. False, if no HTTP redirects should be followed. Alternatively, a number may be passed to change the number of redirects to follow. The following example shows how to write a custom closure that follows 5 consecutive redirects, without the safety checks in the default redirect_ok:
      redirect_ok = function(host,port)
        local c = 5
        return function(url)
          if ( c==0 ) then return false end
          c = c - 1
          return true
        end
      end
      

If a script is planning on making a lot of requests, the pipelining functions can be helpful. pipeline_add queues requests in a table, and pipeline_go performs the requests, returning the results as an array, with the responses in the same order as the requests were added. As a simple example:

 -- Start by defining the 'all' variable as nil
 local all = nil

 -- Add two GET requests and one HEAD to the queue but these requests are
 -- not performed yet. The second parameter represents the "options" table
 -- (which we don't need in this example).
 all = http.pipeline_add('/book',    nil, all)
 all = http.pipeline_add('/test',    nil, all)
 all = http.pipeline_add('/monkeys', nil, all, 'HEAD')

 -- Perform all three requests as parallel as Nmap is able to
 local results = http.pipeline_go('nmap.org', 80, all)

At this point, results is an array with three elements. Each element is a table containing the HTTP result, as discussed above.

One more interface provided by the HTTP library helps scripts determine whether or not a page exists. The identify_404 function will try several URLs on the server to determine what the server's 404 pages look like. It will attempt to identify customized 404 pages that may not return the actual status code 404. If successful, the function page_exists can then be used to determine whether or not a page exists.

Some other miscellaneous functions that can come in handy are response_contains, can_use_head, and save_path. See the appropriate documentation for details.

Source: https://svn.nmap.org/nmap/nselib/http.lua

Script Arguments

http.useragent

The value of the User-Agent header field sent with requests. By default it is "Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)". A value of the empty string disables sending the User-Agent header field.

http.host

The value to use in the Host header of all requests unless otherwise set. By default, the Host header uses the output of stdnse.get_hostname().

http.max-body-size

Limit the received body to specific number of bytes. An oversized body results in an error unless script argument http.truncated-ok or request option truncated_ok is set to true. The default is 2097152 (2MB). Use value -1 to disable the limit altogether. This argument can be overridden case-by-case with request option max_body_size.

http.pipeline

If set, it represents the number of HTTP requests that'll be sent on one connection. This can be set low to make debugging easier, or it can be set high to test how a server reacts (its chosen max is ignored).

http.max-cache-size

The maximum memory size (in bytes) of the cache.

http.max-pipeline

If set, it represents the number of outstanding HTTP requests that should be sent together in a single burst. Defaults to http.pipeline (if set), or to what function get_pipeline_limit returns.

http.truncated-ok

Do not treat oversized body as error. (Use response object flag truncated to check if the returned body has been truncated.) This argument can be overridden case-by-case with request option truncated_ok.

Functions

can_use_head (host, port, result_404, path)

Determine whether or not the server supports HEAD.

clean_404 (body)

Try to remove anything that might change within a 404.

generic_request (host, port, method, path, options)

Do a single request with a given method. The response is returned as the standard response table (see the module documentation).

get (host, port, path, options)

Fetches a resource with a GET request and returns the result as a table.

get_status_string (data)

Take the data returned from a HTTP request and return the status string. Useful for stdnse.debug messages and even advanced output.

get_url (u, options)

Parses a URL and calls http.get with the result. The URL can contain all the standard fields, protocol://host:port/path

grab_forms (body)

Finds forms in html code

head (host, port, path, options)

Fetches a resource with a HEAD request.

identify_404 (host, port)

Try requesting a non-existent file to determine how the server responds to unknown pages ("404 pages")

page_exists (data, result_404, known_404, page, displayall)

Determine whether or not the page that was returned is a 404 page.

parse_date (s)

Parses an HTTP date string

parse_form (form)

Parses a form, that is, finds its action and fields.

parse_redirect (host, port, path, response)

Handles a HTTP redirect

parse_www_authenticate (s)

Parses the WWW-Authenticate header as described in RFC 2616, section 14.47 and RFC 2617, section 1.2.

pipeline_add (path, options, all_requests, method)

Adds a pending request to the HTTP pipeline.

pipeline_go (host, port, all_requests)

Performs all queued requests in the all_requests variable (created by the pipeline_add function).

post (host, port, path, options, ignored, postdata)

Fetches a resource with a POST request.

put (host, port, path, options, putdata)

Uploads a file using the PUT method and returns a result table. This is a simple wrapper around generic_request

redirect_ok (host, port, counter)

Provides the default behavior for HTTP redirects.

response_contains (response, pattern, case_sensitive)

Check if the response variable contains the given text.

save_path (host, port, path, status, links_to, linked_from, contenttype)

This function should be called whenever a valid path (a path that doesn't contain a known 404 page) is discovered.

tag_pattern (tag, endtag)

Create a pattern to find a tag

Functions

can_use_head (host, port, result_404, path)

Determine whether or not the server supports HEAD.

Tests by requesting / and verifying that it returns 200, and doesn't return data. We implement the check like this because can't always rely on OPTIONS to tell the truth.

Note: If identify_404 returns a 200 status, HEAD requests should be disabled. Sometimes, servers use a 200 status code with a message explaining that the page wasn't found. In this case, to actually identify a 404 page, we need the full body that a HEAD request doesn't supply. This is determined automatically if the result_404 field is set.

Parameters

host
The host object.
port
The port to use.
result_404
[optional] The result when an unknown page is requested. This is returned by identify_404. If the 404 page returns a 200 code, then we disable HEAD requests.
path
The path to request; by default, / is used.

Return values:

  1. A boolean value: true if HEAD is usable, false otherwise.
  2. If HEAD is usable, the result of the HEAD request is returned (so potentially, a script can avoid an extra call to HEAD)
clean_404 (body)

Try to remove anything that might change within a 404.

For example:

  • A file path (includes URI)
  • A time
  • A date
  • An execution time (numbers in general, really)

The intention is that two 404 pages from different URIs and taken hours apart should, whenever possible, look the same.

During this function, we're likely going to over-trim things. This is fine -- we want enough to match on that it'll a) be unique, and b) have the best chance of not changing. Even if we remove bits and pieces from the file, as long as it isn't a significant amount, it'll remain unique.

One case this doesn't cover is if the server generates a random haiku for the user.

Parameters

body
The body of the page.
generic_request (host, port, method, path, options)

Do a single request with a given method. The response is returned as the standard response table (see the module documentation).

The get, head, and post functions are simple wrappers around generic_request.

Any 1XX (informational) responses are discarded.

Parameters

host
The host to connect to.
port
The port to connect to.
method
The method to use; for example, 'GET', 'HEAD', etc.
path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).

Return value:

A response table, see module documentation for description.

See also:

get (host, port, path, options)

Fetches a resource with a GET request and returns the result as a table.

This is a simple wrapper around generic_request, with the added benefit of having local caching and support for HTTP redirects. Redirects are followed only if they pass all the validation rules of the redirect_ok function. This function may be overridden by supplying a custom function in the redirect_ok field of the options array. The default function redirects the request if the destination is:

  • Within the same host or domain
  • Has the same port number
  • Stays within the current scheme
  • Does not exceed MAX_REDIRECT_COUNT count of redirects

Caching and redirects can be controlled in the options array, see module documentation for more information.

Parameters

host
The host to connect to.
port
The port to connect to.
path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).

Return value:

A response table, see module documentation for description.

See also:

get_status_string (data)

Take the data returned from a HTTP request and return the status string. Useful for stdnse.debug messages and even advanced output.

Parameters

data
The response table from any HTTP request

Return value:

The best status string we could find: either the actual status string, the status code, or "<unknown status>".
get_url (u, options)

Parses a URL and calls http.get with the result. The URL can contain all the standard fields, protocol://host:port/path

Parameters

u
The URL of the host.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).

Return value:

A response table, see module documentation for description.

See also:

grab_forms (body)

Finds forms in html code

returns table of found forms, in plaintext.

Parameters

body
A response.body in which to search for forms

Return value:

A list of forms.

Fetches a resource with a HEAD request.

Like get, this is a simple wrapper around generic_request with response caching. This function also has support for HTTP redirects. Redirects are followed only if they pass all the validation rules of the redirect_ok function. This function may be overridden by supplying a custom function in the redirect_ok field of the options array. The default function redirects the request if the destination is:

  • Within the same host or domain
  • Has the same port number
  • Stays within the current scheme
  • Does not exceed MAX_REDIRECT_COUNT count of redirects

Caching and redirects can be controlled in the options array, see module documentation for more information.

Parameters

host
The host to connect to.
port
The port to connect to.
path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).

Return value:

A response table, see module documentation for description.

See also:

identify_404 (host, port)

Try requesting a non-existent file to determine how the server responds to unknown pages ("404 pages")

This tells us

  • what to expect when a non-existent page is requested, and
  • if the server will be impossible to scan.

If the server responds with a 404 status code, as it is supposed to, then this function simply returns 404. If it contains one of a series of common status codes, including unauthorized, moved, and others, it is returned like a 404.

I (Ron Bowes) have observed one host that responds differently for three scenarios:

  • A non-existent page, all lowercase (a login page)
  • A non-existent page, with uppercase (a weird error page that says, "Filesystem is corrupt.")
  • A page in a non-existent directory (a login page with different font colours)

As a result, I've devised three different 404 tests, one to check each of these conditions. They all have to match, the tests can proceed; if any of them are different, we can't check 404s properly.

Parameters

host
The host object.
port
The port to which we are establishing the connection.

Return values:

  1. status Did we succeed?
  2. result If status is false, result is an error message. Otherwise, it's the code to expect (typically, but not necessarily, '404').
  3. body Body is a hash of the cleaned-up body that can be used when detecting a 404 page that doesn't return a 404 error code.
page_exists (data, result_404, known_404, page, displayall)

Determine whether or not the page that was returned is a 404 page.

This is actually a pretty simple function, but it's best to keep this logic close to identify_404, since they will generally be used together.

Parameters

data
The data returned by the HTTP request
result_404
The status code to expect for non-existent pages. This is returned by identify_404.
known_404
The 404 page itself, if result_404 is 200. If result_404 is something else, this parameter is ignored and can be set to nil. This is returned by identify_404.
page
The page being requested (used in error messages).
displayall
[optional] If set to true, don't exclude non-404 errors (such as 500).

Return value:

A boolean value: true if the page appears to exist, and false if it does not.
parse_date (s)

Parses an HTTP date string

Supports any of the following formats from section 3.3.1 of RFC 2616:

  • Sun, 06 Nov 1994 08:49:37 GMT (RFC 822, updated by RFC 1123)
  • Sunday, 06-Nov-94 08:49:37 GMT (RFC 850, obsoleted by RFC 1036)
  • Sun Nov 6 08:49:37 1994 (ANSI C's asctime() format)

Parameters

s
the date string.

Return value:

a table with keys year, month, day, hour, min, sec, and isdst, relative to GMT, suitable for input to os.time.
parse_form (form)

Parses a form, that is, finds its action and fields.

Parameters

form
A plaintext representation of form

Return value:

A dictionary with keys: action, method if one is specified, fields which is a list of fields found in the form each of which has a name attribute and type if specified.
parse_redirect (host, port, path, response)

Handles a HTTP redirect

Parameters

host
table as received by the script action function
port
table as received by the script action function
path
string
response
table as returned by http.get or http.head

Return value:

url table as returned by url.parse or nil if there's no redirect taking place
parse_www_authenticate (s)

Parses the WWW-Authenticate header as described in RFC 2616, section 14.47 and RFC 2617, section 1.2.

The return value is an array of challenges. Each challenge is a table with the keys scheme and params.

Parameters

s
The header value text.

Return value:

An array of challenges, or nil on error.
pipeline_add (path, options, all_requests, method)

Adds a pending request to the HTTP pipeline.

The HTTP pipeline is a set of requests that will all be sent at the same time, or as close as the server allows. This allows more efficient code, since requests are automatically buffered and sent simultaneously.

The all_requests argument contains the current list of queued requests (if this is the first time calling pipeline_add, it should be nil). After adding the request to end of the queue, the queue is returned and can be passed to the next pipeline_add call.

When all requests have been queued, call pipeline_go with the all_requests table that has been built.

Parameters

path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
all_requests
[optional] The current pipeline queue (returned from a previous add_pipeline call), or nil if it's the first call.
method
[optional] The HTTP method ('GET', 'HEAD', 'POST', etc). Default: 'GET'.

Return value:

Table with the pipeline requests (plus this new one)

See also:

pipeline_go (host, port, all_requests)

Performs all queued requests in the all_requests variable (created by the pipeline_add function).

Returns an array of responses, each of which is a table as defined in the module documentation above.

Parameters

host
The host to connect to.
port
The port to connect to.
all_requests
A table with all the previously built pipeline requests

Return value:

A list of responses, in the same order as the requests were queued. Each response is a table as described in the module documentation. The response list may be either nil or shorter than expected (up to and including being completely empty) due to communication issues or other errors.
post (host, port, path, options, ignored, postdata)

Fetches a resource with a POST request.

Like get, this is a simple wrapper around generic_request except that postdata is handled properly.

Parameters

host
The host to connect to.
port
The port to connect to.
path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
ignored
Ignored for backwards compatibility.
postdata
A string or a table of data to be posted. If a table, the keys and values must be strings, and they will be encoded into an application/x-www-form-encoded form submission.

Return value:

A response table, see module documentation for description.

See also:

put (host, port, path, options, putdata)

Uploads a file using the PUT method and returns a result table. This is a simple wrapper around generic_request

Parameters

host
The host to connect to.
port
The port to connect to.
path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
putdata
The contents of the file to upload

Return value:

A response table, see module documentation for description.

See also:

redirect_ok (host, port, counter)

Provides the default behavior for HTTP redirects.

Redirects will be followed unless they:

  • contain credentials
  • are on a different domain or host
  • have a different port number or URI scheme
  • redirect to the same URI
  • exceed the maximum number of redirects specified

Parameters

host
table as received by the action function
port
table as received by the action function
counter
number of redirects to follow.

Return value:

a default closure suitable for option "redirect_ok"
response_contains (response, pattern, case_sensitive)

Check if the response variable contains the given text.

Response variable could be a return from a http.get, http.post, http.pipeline_go, etc. The text can be:

  • Part of a header ('content-type', 'text/html', '200 OK', etc)
  • An entire header ('Content-type: text/html', 'Content-length: 123', etc)
  • Part of the body

The search text is treated as a Lua pattern.

Parameters

response
The full response table from a HTTP request.
pattern
The pattern we're searching for. Don't forget to escape '-', for example, 'Content%-type'. The pattern can also contain captures, like 'abc(.*)def', which will be returned if successful.
case_sensitive
[optional] Set to true for case-sensitive searches. Default: not case sensitive.

Return values:

  1. result True if the string matched, false otherwise
  2. matches An array of captures from the match, if any
save_path (host, port, path, status, links_to, linked_from, contenttype)

This function should be called whenever a valid path (a path that doesn't contain a known 404 page) is discovered.

It will add the path to the registry in several ways, allowing other scripts to take advantage of it in interesting ways.

Parameters

host
The host the path was discovered on (not necessarily the host being scanned).
port
The port the path was discovered on (not necessarily the port being scanned).
path
The path discovered. Calling this more than once with the same path is okay; it'll update the data as much as possible instead of adding a duplicate entry
status
[optional] The status code (200, 404, 500, etc). This can be left off if it isn't known.
links_to
[optional] A table of paths that this page links to.
linked_from
[optional] A table of paths that link to this page.
contenttype
[optional] The content-type value for the path, if it's known.
tag_pattern (tag, endtag)

Create a pattern to find a tag

Case-insensitive search for tags

Parameters

tag
The name of the tag to find
endtag
Boolean true if you are looking for an end tag, otherwise it will look for a start tag

Return value:

A pattern to find the tag