1
0
mirror of https://github.com/nmap/nmap.git synced 2025-12-06 04:31:29 +00:00

50 Commits

Author SHA1 Message Date
dmiller
8515e83671 Handle redirect URLs without a host, e.g. https:///path 2020-02-04 18:54:20 +00:00
dmiller
edb130e908 Replace some print calls with proper debug functions. See #1774 2019-10-07 03:13:09 +00:00
dmiller
0500811f5a Move string utility functions to stringaux.lua 2018-10-18 01:08:19 +00:00
dmiller
807b66480a Require extracted links to be within an HTML tag
httpspider was extracting "links" from javascript if there was a
variable called "src" or similar. By requiring an open HTML tag, we
eliminate this problem, still matching src, href, or action attributes
of any tag.
2018-03-09 19:07:49 +00:00
dmiller
b4f741c18b httspider.URL's tostring method returns normalized URL. See #1107 2018-03-09 19:07:47 +00:00
dmiller
502c082240 Don't bypass url.lua parsing in httpspider. 2018-02-28 03:43:12 +00:00
nnposter
fcac8c6e28 Removes dot and dot-dot path segments from parsed URLs 2018-02-26 00:27:36 +00:00
dmiller
1291626c1b Use canonical ASCII host/domain name for withinhost/withindomain in httpspider 2017-09-28 04:31:31 +00:00
rewanth
984a670c4c Removes ambiguous file extensions from httpspider.lua 2017-06-21 17:47:22 +00:00
rewanth
9104cbe810 Add missing file extensions to httpspider blacklist. Closes #860 2017-06-02 17:42:24 +00:00
nnposter
e80976a13a Provides a common function, url.get_default_port(), for obtaining
the default port number for a given scheme. Fixes #781
2017-04-19 18:00:36 +00:00
nnposter
af6bbc35bb Changes the port type returned from url.parse() to an actual integer, as
opposed to a string that represents an integer. Fixes #833, fixes #817.
2017-04-19 17:02:32 +00:00
dmiller
1a1dc0e47a Fix some typos 2014-08-23 21:35:32 +00:00
batrick
ee6622aea4 nselib stdnse.print_debug -> stdnse.debug
$ f() { find -name \*.lua -exec /bin/echo sed -i "$1" {} \; ; }
$ f 's/stdnse.print_debug( *\([0-9]*\) *, */stdnse.debug\1(/'
$ f 's/stdnse.print_debug( *"\(.*\))/stdnse.debug1("\1)/'
2014-08-03 00:56:45 +00:00
sophron
efb73576e1 [NSE] A negative value should disable the maxpage limit according to NSEDoc. 2014-05-13 10:14:39 +00:00
dmiller
1b71f75aad Spelling fixes for Lua files
Mostly in documentation/comments, but a couple code bugs were caught,
including a call to stdnse.pirnt_debug and a mis-declared variable.
2014-02-19 04:15:46 +00:00
dmiller
fb67a6717e Re-indent some libs and scripts, change 4 to 2-space indent
Mostly found with:

    for i in nselib/*.lua scripts/*.nse; do
      echo $(perl -lne 'BEGIN{$a=$p=0}next unless $_;/^(\s*)/;' \
        -e '$l=length$1;next if$l==$p;$a+=(abs($l-$p)-$a)/$.;' \
        -e '$p=$l;END{print$a}' $i) $i
    done | sort -nr

And indented with: https://gist.github.com/bonsaiviking/8845871

whois-ip.nse was particularly mangled (probably my fault due to using
vim's built-in indentation script, but it could be structured better)
2014-02-06 23:25:28 +00:00
dmiller
620f9fdb34 Remove trailing whitespace in lua files
Whitespace is not significant, so this should not be a problem.
https://secwiki.org/w/Nmap/Code_Standards
2014-01-23 21:51:58 +00:00
dmiller
db1d82ad1f Fixed global assignments with nse_check_globals
All fixes made by hand. A couple real bugs/errors fixed, due to
copy-paste of code from other scripts without changing variable names.
2014-01-22 17:45:00 +00:00
sophron
683e83117b [NSE] Convert these values to numeric (for example, when they passed as command line args). 2013-08-17 06:03:45 +00:00
sophron
1ecec300db Allowed callbacks to 'withinhost' and 'withindomain' options and introduced 'doscraping' option. 2013-07-18 14:03:42 +00:00
sophron
28f2044442 Replaced tabs with spaces. 2013-07-18 13:58:25 +00:00
sophron
b9f35cbcac Fixed syntax mistake. 2013-07-18 13:56:45 +00:00
sophron
ac4fe58a21 Added an option to turn off http caching while crawling. 2013-07-06 14:01:01 +00:00
patrik
e7cb28619e fixed a bug where any url would be treated as withinhost due to a lacking
return statement in the removewww function
2012-08-03 06:13:57 +00:00
patrik
6dc6b95377 fixed a bug in whitelisting code 2012-07-26 13:37:04 +00:00
tomsellers
b82c819afb Update to add additional blacklist entries the httpspider library. The goal is to avoid downloading and processing certain additional video, audio and binary formats.
This should speed up crawling certain sites.  In the case of http-email-harvest it should reduce some of the false positives generated by running the RegEx against binary data. The only script that this appears likely to have affected the results of would have been http-sitemap-generator and that script specifically disables the blacklist.
2012-07-10 00:23:02 +00:00
perdo
7443db6f37 Hosts that differ only on the 'www' prefix are now treated as being equal. Also added some documentation for httpspider.useheadfornonwebfiles. 2012-07-03 21:48:26 +00:00
perdo
33c3838c45 Fixed a missing require in httpspider. 2012-07-01 09:45:14 +00:00
perdo
2730adc516 Modified the spidering library to allow to use a HEAD rather then GET request for files with certain extensions. 2012-06-25 17:54:38 +00:00
patrik
bb359adaa1 Played a round of nse_check_globals and fixed a bunch of reported problems. 2012-06-15 19:32:36 +00:00
patrik
cfdf67f8c7 Applied patch from Dan Miller that adds new suffixes and cleans up the
blacklisting code of the httpspider; http://seclists.org/nmap-dev/2012/q2/737
2012-06-15 10:17:09 +00:00
batrick
000f6dc4d9 Lua 5.2 upgrade [1] for NSE.
[1] http://seclists.org/nmap-dev/2012/q2/34
2012-05-27 08:53:32 +00:00
patrik
cbf901c195 added coded to stop spidering if the base coroutine is dead. 2012-05-22 18:22:18 +00:00
patrik
84c3de36fc Applied patch from Daniel Miller to fix two bugs in the httpspider library:
* First bug, the LinkExtractor portion of httpspider doesn't check for a negative
    maxdepth (indicating no limit), and rejects all links.
  * Second bug, the withinhost and withindomain matching functions would throw an error
    when presented with a URL without a host portion. 

In addition the validate_link function was moved out to a separate function in the
LinkExtractor Class. [Daniel Miller]
2012-05-22 17:26:12 +00:00
patrik
49078b178f fixed deadlock when calling stop and the thread was already dead 2012-04-07 09:10:24 +00:00
patrik
49c3b4e84e Adjusted link patterns to exclude leading and trailing whitespace in
the link extractor parsing function
2012-03-29 20:45:04 +00:00
patrik
3bfb56bbb1 bugfix for withindomain and withinhost checks. 2012-03-23 19:23:25 +00:00
patrik
344a39e3ac Re-wrote withindomain and withinhost functions 2012-03-06 15:49:48 +00:00
patrik
fccccff960 * bugfixes to several http scripts related to new redirect code in http
library
* added option to httpspider that allows passing the redirect_ok closure to
  the http library
[Patrik]
2012-02-11 22:37:14 +00:00
patrik
e8dad669ef Fixed bug in redirection code reported by David. The redirect_ok function
would fail validating a location if the port passed to http.get or http.head
was a number and not a table. [Patrik]
2012-02-11 17:50:48 +00:00
patrik
557874588f o [NSE] Modified the sql-injection script to use the httpspider library.
[Lauri Kokkonen]
2012-02-05 13:47:31 +00:00
patrik
2d55f8822c Fixed a number of bugs and prepared the library to handle the new redirect
code being added to the http-library. [Patrik]
2012-02-02 21:23:19 +00:00
patrik
d4ca7dccfd fixed bug that would fail reading url and options supplied to the Helper:new
method.
2012-01-28 19:29:32 +00:00
patrik
156e89c597 Fixed a bug that would incorrectly parse the url scheme [Patrik] 2011-12-17 19:45:48 +00:00
patrik
4214307364 o [NSE] Added the script http-grep that attempts to match web pages and urls
against a given pattern. [Patrik]
2011-12-11 19:44:26 +00:00
patrik
74b53a6a14 o [NSE] Added stop function to crawler so that scripts can properly shutdown
the crawler in case they want to end early. [Patrik]
2011-12-11 10:59:35 +00:00
patrik
8254da793e o [NSE] Added getLimitations function to httpspider that returns any
limitations imposed on the crawler. [Patrik]
2011-12-10 10:11:56 +00:00
patrik
e20a1b5174 o [NSE] Modified the httpspider library to prefetch links in the queue and
change how script arguments are processed. Script and library arguments are
  now processed from within the library. [Patrik]
2011-12-09 15:48:19 +00:00
patrik
682a9a746b o [NSE] Added a new httpspider library and the script http-email-harvest that
collects e-mail addresses by spidering a website. [Patrik]
2011-12-06 22:47:11 +00:00