to reuse an ACK ping probe from host detection during a SYN port scan. This can
greatly speed up a scan if the SYN scan finds only filtered ports.
One difficulty with implementing this is that not all ping probes are
appropriate for all scan types.
nmap -PA -sU scanme.nmap.org
would cache the ACK ping probe and send ACK pings during the UDP scan. But the
pcap filter for the UDP scan doesn't catch TCP packets, so the replies would
not be noticed and they would show up as dropped pings. Likewise,
nmap -PR -sS 192.168.0.1
would segfault when it tried to use an uninitialized Ethernet descriptor to
send an ARP ping during the SYN scan, which would use raw sockets.
To fix this I added a function pingprobe_is_appropriate that determines whether
a given ping probe is appropriate for the current scan type. If not, the
constructor for HostScanStats just erases the ping probe.
More types of ping probes could be made "appropriate." TCP timing pings work
during a UDP scan if only the pcap filter is expanded to include TCP packets.
In r8541 readip_pcap was given the ability to validate packets, and it also
returns a different length in some cases than it used to:
+ /* OK, since the IP header has been validated, we don't want to tell
+ * the caller they have more packet than they really have. This can
+ * be caused by the Ethernet CRC trailer being counted, for example.
+ */
+ if (*len > ntohs(iphdr->ip_len))
+ *len = ntohs(iphdr->ip_len);
which made some tests having to do with packet length invalid. They were
removed but this one was missed.
from ICMP probes during a protocol scan (protoscanicmphack). I don't know why
it was NULL before, but that's wrong. It was probably never noticed because in
the case of a port update, all that happens is a failure to update the timing.
In the case of a ping probe, it would look like a dropped ping probe, but that
woudl be unlikely because protocol scans usually don't take very long. I
discovered it while testing code to allow ping probes to persist between host
discovery and port scanning.
packet is OK from the get-go rather than running basic checks of it's own.
In a nutshell this patch checks to make sure:
1) there is enough room for an IP header in the amount of bytes read
2) the IP version number is correct
3) the IP length fields are at least as big as the standard header
4) the IP packet received isn't a fragment, or is the initial fragment
5) that next level headers seem reasonable
For TCP, this checks that there is enough room for the header in the number
of bytes read, and that any option lengths are correct. The options checked
are MSS, WScale, SackOK, Sack, and Timestamp.
This also fixes a bug I discovered while testing. Since the Ethernet CRC
(and other datalink-layer data) could be read and counted, it was being
returned that there was more IP packet than there really was. This didn't
cause an overrun of the buffer or anything, just that garbage data could have
easily been read instead of real packet data. Now, if validity is checked for
and the number of total bytes read is larger than the IP's length, the length
is set to the IP header's total length field.
This seems to work great after doing what testing I could. It's been out on
nmap-dev for a couple of weeks without any bad reports (none at all for that
matter). I reviewed this patch again before committing and it looks good as
well.
doAnyOutstandingProbes performance improvements. Here is the log message from
r7914 in nmap-fixed-rate.
Keep a cache of the most recently processed probe for each host in
doAnyOutstandingRetransmits. This greatly reduces the amount of CPU used by
that function when the lists of outstanding probes grow long, such as when a
high scan rate is specified with --min-rate.
This is not most efficient possible way this could be done, but it is a pretty
big win, and it's very non-invasive. The changes are limited entirely to
doAnyOutstandingRetransmits, with no new global state in ultra_scan.
# nmap -d --min-rate 50000 -n -PN -p1-65535 --max-rtt-timeout 500 --max-retries 1 scanme.nmap.org
gprof before:
% cumulative self self total
time seconds seconds calls s/call s/call name
49.74 30.96 30.96 2709 0.01 0.02 doAnyOutstandingRetransmits(UltraScanInfo*)
10.51 37.50 6.54 127256413 0.00 0.00 std::_List_iterator<UltraProbe*>::operator--(int)
gprof after:
% cumulative self self total
time seconds seconds calls s/call s/call name
20.48 3.36 3.36 2667 0.00 0.00 doAnyOutstandingRetransmits(UltraScanInfo*)
16.21 6.02 2.66 2667 0.00 0.00 processData(UltraScanInfo*)
Note that 50000 packets per second is way excessive. I really only get about
6000 in practice. But the point is there is no huge CPU penalty for giving an
excessive rate.
Previously the ping probe data structures were stored in NmapOps,
now they will be stored in the scan_lists struct. All other changes
auxiliary to this reorganization.
only code left in Nmap that still uses rand() is in the Lua math
library. Perhaps at some point we'll need to expose high-quality random
numbers to Lua via our custom nmap library.
guide. They don't honor scan delay and may violate congestion control.
Both this things should be fixed. I was going to do it by having
get_next_target_probe just return the same probe multiple times, and
then either extend struct probespec to include a source address or have
sendIPScanProbe keep track of the decoy index and fill in source
addresses. But I was stopped by timing pings. Those should certainly be
decoyed, but in the code they are just sent as they are needed, and
don't have a dispatching function to modify. What would be good is a
global queue of probes waiting to be sent you could just insert all your
spoofed probes into, and then let the rest of the code take care of
scheduling them.
This change keeps a list of probes awaiting retransmit so that
doAnyOutstandingRetransmits doesn't have to search for them. At high
scan rates this function could take 100 ms or more. Now I have measured
it to take 2 ms or less.
The variable num_probes_waiting_retransmit has been renamed
num_probes_timed_out to better explain its purpose. This list of probes
that can be retransmitted immediately is called
probes_waiting_retransmits, but not all timed-out probes can be
retransmitted immediately. I've done my best to explain the distinction
in comments.
I thought long and hard about how to address this issue, and this is
what I decided on. But of course, every little optimization brings some
complexity and the chance of making a mistake. I'd appreciate someone
taking a look at this change.
foudn that five files can be open on Mac OS X: stdin, stdout, stderr, /dev/tty,
and /private/var/run/utmpx. This could cause a non-root scan at a high scan
rateto fail with the message "Too many open files". I was able to cause this
with "nmap --min-rate 5000 localhost -p-".
That command still fails with the same error message, but for an entirely
different reason. After a while, one of the connect calls fails with an errno of
22 = EINVAL, Invalid argument. Whatever this means, the socket doesn't get
closed, Nmap just reports a "Strange error from connect". The socket is still
open but Nmap doesn't include it in its count of open sockets, so it's off by
one (or more, conceivably). This allows it to try to open one too many sockets
and bomb with an error message.
Note that running as non-root is important both because it uses a connect scan
and because non-root users have a lower limit on open files.
I've tried just closing the socket when EINVAL is returned, and that fixes the
problem. But that's likely to differ on different systems. Plus I don't know why
EINVAL is returned; maybe it's an OS bug. This only affects localhost scans and
only at high scan rates, so I'm leaving it alone.
enough for host discovery, and 100 doesn't give much benefit because the probe
timeouts increase to slow the scan down. While it's faster in some cases, it
also increases the variance in scan times. For more analysis see
http://www.bamsoftware.com/wiki/Nmap/PerformanceGraphs#timeouts.
scales per-host congestion control increments in the same way those for the
group already are. This speeds scanning in some cases (particularly with few
hosts, when the group congestion control is not the limiting factor). I'm going
to experiment with raising the increment cap to allow this to have more of an
effect.
Scale host congestion control variables similarly to the way group congestion
control is scaled. For the rationale see
http://www.bamsoftware.com/wiki/Nmap/PerformanceGraphs#host-scaled.
Host cc_scale should use (numprobes_sent + numpings_sent), not (numprobes_sent + numprobes_sent).
Remove special-purpose log functions for graphing congestion control and other t
hings. There's enough information provided by -d3.
Update the congestion control graph program and add a program for graphing probe
s and drops.
Increase the initial ccthresh from 50 to 75.
Change how much the congestion threshold drops on packet drops.
Print group timing stats with -d2 and individual host timing stats with -d3.
Bump up the cc-graph.sh y axis limit to 80.
Put graphs in the same directory as their log file.
Go ahead and adjust timing for ICMP destination unreachables. I'm going to commi
t and experimental change to the congestion control that doesn't rely on this an
y more.
Scale group congestion control increments by the inverse of the packet
receipt ratio. This gives great performance without ignoring ICMP
destintation unreachable drops. This may be the breakthrough we've been
looking for.
I'll probably send a message about this later today. For information and
graphs right now, see
http://www.bamsoftware.com/wiki/Nmap/ResponseRateScaledCongestionControl.
Sorry it's only in my nmap-massping-migration branch for now, but please
give it a try.
Only -d2 is now needed for cc-graph.sh.
Put a cap of 50 on the cwnd scaling factor.
Fix up the order of things in the packet_ratio debugging output.
Move the packet_ratio debugging output to printAnyStats and rearrange the order
in which things are printed.
Put a header with the scan args at the top of the probes-graph.sh data files.
Add a function pcap_print_stats that shows the number of received and dropped pa
ckets for a descriptor.
Call pcap_print_stats after a run of ultra_scan.
Increase the congestion window less aggressively than before with -T4 and -T5 (s
till more aggressivly than with lesser timing values).
until now that Visual C++ made a bunch of whitespace changes in an otherwise
small diff. I'll re-commit the changes in a moment without the whitespace
changes.
Print group timing stats with -d2 and individual host timing stats with -d3.
Change how much the congestion threshold drops on packet drops.
Increase the initial ccthresh from 50 to 75.