Library httpspider
A smallish httpspider library providing basic spidering capabilities It consists of the following classes:
Options
LinkExtractor
URL
UrlQueue
Crawler
The following sample code shows how the spider could be used:
local crawler = httpspider.Crawler:new( host, port, '/', { scriptname = SCRIPT_NAME } )
crawler:set_timeout(10000)
local result
while(true) do
local status, r = crawler:crawl()
if ( not(status) ) then
break
end
if ( r.response.body:match(str_match) ) then
crawler:stop()
result = r.url
break
end
end
return result
Author:
| Patrik Karlsson <patrik@cqure.net> |
Source: http://nmap.org/svn/nselib/httpspider.lua
Script Arguments
httpspider.url
the url to start spidering. This is a URL relative to the scanned host eg. /default.html (default: /)
httpspider.maxpagecount
the maximum amount of pages to visit. A negative value disables the limit (default: 20)
httpspider.noblacklist
if set, doesn't load the default blacklist
httpspider.maxdepth
the maximum amount of directories beneath the initial url to spider. A negative value disables the limit. (default: 3)
httpspider.withinhost
only spider URLs within the same host. (default: true)
httpspider.withindomain
only spider URLs within the same
domain. This widens the scope from withinhost and can
not be used in combination. (default: false)




