patrik
6dc6b95377
fixed a bug in whitelisting code
2012-07-26 13:37:04 +00:00
tomsellers
b82c819afb
Update to add additional blacklist entries the httpspider library. The goal is to avoid downloading and processing certain additional video, audio and binary formats.
...
This should speed up crawling certain sites. In the case of http-email-harvest it should reduce some of the false positives generated by running the RegEx against binary data. The only script that this appears likely to have affected the results of would have been http-sitemap-generator and that script specifically disables the blacklist.
2012-07-10 00:23:02 +00:00
perdo
7443db6f37
Hosts that differ only on the 'www' prefix are now treated as being equal. Also added some documentation for httpspider.useheadfornonwebfiles.
2012-07-03 21:48:26 +00:00
perdo
33c3838c45
Fixed a missing require in httpspider.
2012-07-01 09:45:14 +00:00
perdo
2730adc516
Modified the spidering library to allow to use a HEAD rather then GET request for files with certain extensions.
2012-06-25 17:54:38 +00:00
patrik
bb359adaa1
Played a round of nse_check_globals and fixed a bunch of reported problems.
2012-06-15 19:32:36 +00:00
patrik
cfdf67f8c7
Applied patch from Dan Miller that adds new suffixes and cleans up the
...
blacklisting code of the httpspider; http://seclists.org/nmap-dev/2012/q2/737
2012-06-15 10:17:09 +00:00
batrick
000f6dc4d9
Lua 5.2 upgrade [1] for NSE.
...
[1] http://seclists.org/nmap-dev/2012/q2/34
2012-05-27 08:53:32 +00:00
patrik
cbf901c195
added coded to stop spidering if the base coroutine is dead.
2012-05-22 18:22:18 +00:00
patrik
84c3de36fc
Applied patch from Daniel Miller to fix two bugs in the httpspider library:
...
* First bug, the LinkExtractor portion of httpspider doesn't check for a negative
maxdepth (indicating no limit), and rejects all links.
* Second bug, the withinhost and withindomain matching functions would throw an error
when presented with a URL without a host portion.
In addition the validate_link function was moved out to a separate function in the
LinkExtractor Class. [Daniel Miller]
2012-05-22 17:26:12 +00:00
patrik
49078b178f
fixed deadlock when calling stop and the thread was already dead
2012-04-07 09:10:24 +00:00
patrik
49c3b4e84e
Adjusted link patterns to exclude leading and trailing whitespace in
...
the link extractor parsing function
2012-03-29 20:45:04 +00:00
patrik
3bfb56bbb1
bugfix for withindomain and withinhost checks.
2012-03-23 19:23:25 +00:00
patrik
344a39e3ac
Re-wrote withindomain and withinhost functions
2012-03-06 15:49:48 +00:00
patrik
fccccff960
* bugfixes to several http scripts related to new redirect code in http
...
library
* added option to httpspider that allows passing the redirect_ok closure to
the http library
[Patrik]
2012-02-11 22:37:14 +00:00
patrik
e8dad669ef
Fixed bug in redirection code reported by David. The redirect_ok function
...
would fail validating a location if the port passed to http.get or http.head
was a number and not a table. [Patrik]
2012-02-11 17:50:48 +00:00
patrik
557874588f
o [NSE] Modified the sql-injection script to use the httpspider library.
...
[Lauri Kokkonen]
2012-02-05 13:47:31 +00:00
patrik
2d55f8822c
Fixed a number of bugs and prepared the library to handle the new redirect
...
code being added to the http-library. [Patrik]
2012-02-02 21:23:19 +00:00
patrik
d4ca7dccfd
fixed bug that would fail reading url and options supplied to the Helper:new
...
method.
2012-01-28 19:29:32 +00:00
patrik
156e89c597
Fixed a bug that would incorrectly parse the url scheme [Patrik]
2011-12-17 19:45:48 +00:00
patrik
4214307364
o [NSE] Added the script http-grep that attempts to match web pages and urls
...
against a given pattern. [Patrik]
2011-12-11 19:44:26 +00:00
patrik
74b53a6a14
o [NSE] Added stop function to crawler so that scripts can properly shutdown
...
the crawler in case they want to end early. [Patrik]
2011-12-11 10:59:35 +00:00
patrik
8254da793e
o [NSE] Added getLimitations function to httpspider that returns any
...
limitations imposed on the crawler. [Patrik]
2011-12-10 10:11:56 +00:00
patrik
e20a1b5174
o [NSE] Modified the httpspider library to prefetch links in the queue and
...
change how script arguments are processed. Script and library arguments are
now processed from within the library. [Patrik]
2011-12-09 15:48:19 +00:00
patrik
682a9a746b
o [NSE] Added a new httpspider library and the script http-email-harvest that
...
collects e-mail addresses by spidering a website. [Patrik]
2011-12-06 22:47:11 +00:00