dmiller
8515e83671
Handle redirect URLs without a host, e.g. https:///path
2020-02-04 18:54:20 +00:00
dmiller
edb130e908
Replace some print calls with proper debug functions. See #1774
2019-10-07 03:13:09 +00:00
dmiller
0500811f5a
Move string utility functions to stringaux.lua
2018-10-18 01:08:19 +00:00
dmiller
807b66480a
Require extracted links to be within an HTML tag
...
httpspider was extracting "links" from javascript if there was a
variable called "src" or similar. By requiring an open HTML tag, we
eliminate this problem, still matching src, href, or action attributes
of any tag.
2018-03-09 19:07:49 +00:00
dmiller
b4f741c18b
httspider.URL's tostring method returns normalized URL. See #1107
2018-03-09 19:07:47 +00:00
dmiller
502c082240
Don't bypass url.lua parsing in httpspider.
2018-02-28 03:43:12 +00:00
nnposter
fcac8c6e28
Removes dot and dot-dot path segments from parsed URLs
2018-02-26 00:27:36 +00:00
dmiller
1291626c1b
Use canonical ASCII host/domain name for withinhost/withindomain in httpspider
2017-09-28 04:31:31 +00:00
rewanth
984a670c4c
Removes ambiguous file extensions from httpspider.lua
2017-06-21 17:47:22 +00:00
rewanth
9104cbe810
Add missing file extensions to httpspider blacklist. Closes #860
2017-06-02 17:42:24 +00:00
nnposter
e80976a13a
Provides a common function, url.get_default_port(), for obtaining
...
the default port number for a given scheme. Fixes #781
2017-04-19 18:00:36 +00:00
nnposter
af6bbc35bb
Changes the port type returned from url.parse() to an actual integer, as
...
opposed to a string that represents an integer. Fixes #833 , fixes #817 .
2017-04-19 17:02:32 +00:00
dmiller
1a1dc0e47a
Fix some typos
2014-08-23 21:35:32 +00:00
batrick
ee6622aea4
nselib stdnse.print_debug -> stdnse.debug
...
$ f() { find -name \*.lua -exec /bin/echo sed -i "$1" {} \; ; }
$ f 's/stdnse.print_debug( *\([0-9]*\) *, */stdnse.debug\1(/'
$ f 's/stdnse.print_debug( *"\(.*\))/stdnse.debug1("\1)/'
2014-08-03 00:56:45 +00:00
sophron
efb73576e1
[NSE] A negative value should disable the maxpage limit according to NSEDoc.
2014-05-13 10:14:39 +00:00
dmiller
1b71f75aad
Spelling fixes for Lua files
...
Mostly in documentation/comments, but a couple code bugs were caught,
including a call to stdnse.pirnt_debug and a mis-declared variable.
2014-02-19 04:15:46 +00:00
dmiller
fb67a6717e
Re-indent some libs and scripts, change 4 to 2-space indent
...
Mostly found with:
for i in nselib/*.lua scripts/*.nse; do
echo $(perl -lne 'BEGIN{$a=$p=0}next unless $_;/^(\s*)/;' \
-e '$l=length$1;next if$l==$p;$a+=(abs($l-$p)-$a)/$.;' \
-e '$p=$l;END{print$a}' $i) $i
done | sort -nr
And indented with: https://gist.github.com/bonsaiviking/8845871
whois-ip.nse was particularly mangled (probably my fault due to using
vim's built-in indentation script, but it could be structured better)
2014-02-06 23:25:28 +00:00
dmiller
620f9fdb34
Remove trailing whitespace in lua files
...
Whitespace is not significant, so this should not be a problem.
https://secwiki.org/w/Nmap/Code_Standards
2014-01-23 21:51:58 +00:00
dmiller
db1d82ad1f
Fixed global assignments with nse_check_globals
...
All fixes made by hand. A couple real bugs/errors fixed, due to
copy-paste of code from other scripts without changing variable names.
2014-01-22 17:45:00 +00:00
sophron
683e83117b
[NSE] Convert these values to numeric (for example, when they passed as command line args).
2013-08-17 06:03:45 +00:00
sophron
1ecec300db
Allowed callbacks to 'withinhost' and 'withindomain' options and introduced 'doscraping' option.
2013-07-18 14:03:42 +00:00
sophron
28f2044442
Replaced tabs with spaces.
2013-07-18 13:58:25 +00:00
sophron
b9f35cbcac
Fixed syntax mistake.
2013-07-18 13:56:45 +00:00
sophron
ac4fe58a21
Added an option to turn off http caching while crawling.
2013-07-06 14:01:01 +00:00
patrik
e7cb28619e
fixed a bug where any url would be treated as withinhost due to a lacking
...
return statement in the removewww function
2012-08-03 06:13:57 +00:00
patrik
6dc6b95377
fixed a bug in whitelisting code
2012-07-26 13:37:04 +00:00
tomsellers
b82c819afb
Update to add additional blacklist entries the httpspider library. The goal is to avoid downloading and processing certain additional video, audio and binary formats.
...
This should speed up crawling certain sites. In the case of http-email-harvest it should reduce some of the false positives generated by running the RegEx against binary data. The only script that this appears likely to have affected the results of would have been http-sitemap-generator and that script specifically disables the blacklist.
2012-07-10 00:23:02 +00:00
perdo
7443db6f37
Hosts that differ only on the 'www' prefix are now treated as being equal. Also added some documentation for httpspider.useheadfornonwebfiles.
2012-07-03 21:48:26 +00:00
perdo
33c3838c45
Fixed a missing require in httpspider.
2012-07-01 09:45:14 +00:00
perdo
2730adc516
Modified the spidering library to allow to use a HEAD rather then GET request for files with certain extensions.
2012-06-25 17:54:38 +00:00
patrik
bb359adaa1
Played a round of nse_check_globals and fixed a bunch of reported problems.
2012-06-15 19:32:36 +00:00
patrik
cfdf67f8c7
Applied patch from Dan Miller that adds new suffixes and cleans up the
...
blacklisting code of the httpspider; http://seclists.org/nmap-dev/2012/q2/737
2012-06-15 10:17:09 +00:00
batrick
000f6dc4d9
Lua 5.2 upgrade [1] for NSE.
...
[1] http://seclists.org/nmap-dev/2012/q2/34
2012-05-27 08:53:32 +00:00
patrik
cbf901c195
added coded to stop spidering if the base coroutine is dead.
2012-05-22 18:22:18 +00:00
patrik
84c3de36fc
Applied patch from Daniel Miller to fix two bugs in the httpspider library:
...
* First bug, the LinkExtractor portion of httpspider doesn't check for a negative
maxdepth (indicating no limit), and rejects all links.
* Second bug, the withinhost and withindomain matching functions would throw an error
when presented with a URL without a host portion.
In addition the validate_link function was moved out to a separate function in the
LinkExtractor Class. [Daniel Miller]
2012-05-22 17:26:12 +00:00
patrik
49078b178f
fixed deadlock when calling stop and the thread was already dead
2012-04-07 09:10:24 +00:00
patrik
49c3b4e84e
Adjusted link patterns to exclude leading and trailing whitespace in
...
the link extractor parsing function
2012-03-29 20:45:04 +00:00
patrik
3bfb56bbb1
bugfix for withindomain and withinhost checks.
2012-03-23 19:23:25 +00:00
patrik
344a39e3ac
Re-wrote withindomain and withinhost functions
2012-03-06 15:49:48 +00:00
patrik
fccccff960
* bugfixes to several http scripts related to new redirect code in http
...
library
* added option to httpspider that allows passing the redirect_ok closure to
the http library
[Patrik]
2012-02-11 22:37:14 +00:00
patrik
e8dad669ef
Fixed bug in redirection code reported by David. The redirect_ok function
...
would fail validating a location if the port passed to http.get or http.head
was a number and not a table. [Patrik]
2012-02-11 17:50:48 +00:00
patrik
557874588f
o [NSE] Modified the sql-injection script to use the httpspider library.
...
[Lauri Kokkonen]
2012-02-05 13:47:31 +00:00
patrik
2d55f8822c
Fixed a number of bugs and prepared the library to handle the new redirect
...
code being added to the http-library. [Patrik]
2012-02-02 21:23:19 +00:00
patrik
d4ca7dccfd
fixed bug that would fail reading url and options supplied to the Helper:new
...
method.
2012-01-28 19:29:32 +00:00
patrik
156e89c597
Fixed a bug that would incorrectly parse the url scheme [Patrik]
2011-12-17 19:45:48 +00:00
patrik
4214307364
o [NSE] Added the script http-grep that attempts to match web pages and urls
...
against a given pattern. [Patrik]
2011-12-11 19:44:26 +00:00
patrik
74b53a6a14
o [NSE] Added stop function to crawler so that scripts can properly shutdown
...
the crawler in case they want to end early. [Patrik]
2011-12-11 10:59:35 +00:00
patrik
8254da793e
o [NSE] Added getLimitations function to httpspider that returns any
...
limitations imposed on the crawler. [Patrik]
2011-12-10 10:11:56 +00:00
patrik
e20a1b5174
o [NSE] Modified the httpspider library to prefetch links in the queue and
...
change how script arguments are processed. Script and library arguments are
now processed from within the library. [Patrik]
2011-12-09 15:48:19 +00:00
patrik
682a9a746b
o [NSE] Added a new httpspider library and the script http-email-harvest that
...
collects e-mail addresses by spidering a website. [Patrik]
2011-12-06 22:47:11 +00:00