This is designed to solve the following problem: On Solaris 10 (maybe other
platforms), doing a select on a pcap fd works, in that it returns true when
there are frames available to be read. However, after finding the fd selectable
and calling pcap_dispatch (or pcap_next, etc.), libpcap may read more than one
frame and buffer them internally. This means that later calls to select will
return false. So there may be a frame to be read, but you can't know without
calling pcap_dispatch to check, and that blocks indefinitely (on Solaris) if
you're wrong.
The way this works is that we do a non-blocking read on the pcap fd to see if
there is anything available. If not, we do a select with a timeout as usual.
(The select is to enforce the timeout and prevent spinning CPU by repeatedly
trying non-blocking reads.)
I don't know if this phenomenon affects other platforms than Solaris 10
(more specifically, platforms using DLPI for libpcap). This same thing may be
safe or necessary on other platforms. But I have limited it to Solaris for now.
Solaris 11 uses BPF, not DLPI, for libpcap, but we can unconditionally follow
this code path on Solaris because BPF pcap fds can't be selected on.
This entails using names like std::vector and std::list rather than bare
vector and list, which was already the prevailing style. The immediate
cause of this is a header file on Solaris 10 that uses a "struct map"
that conflicts with std::map.
In file included from struct_ip.h:40:0,
from tcpip.cc:108:
/usr/include/net/if.h:99:9: error: template argument required for 'struct map'
Previously the "delta" variable, representing the measured rtt, was
cobbered in place to be srtt - delta in one branch. This was confusing
when a later output message printed "delta", which could have a
different meaning depending on which path was taken.
nsock_tod is Nsock's idea of the current time. It is updated when an
nsock_pool is initialized, on each iteration of nsock_loop, and in a few
other places. What could go wrong, with respect to timers, is a sequence
like this:
nsp_new
[... some long delay ...]
nsock_create_timer(timeout)
nsock_loop
The time elapsed after the creatino of the timer until it fires would
not be timeout, but rather timeout - delay. If the delay was long
enough, the timer would fire as loop as nsock_loop was entered.
This showed itself in IPv6 OS detection. We schedule 6 timers
immediately, 100 ms apart. If the pcap_open or anything else took too
long, then the timers would fire all at once. This messed up the
calculation of the TCP_ISR feature.
Perhaps we should do this when any new event is created? It is already
done manually at the beginning of each of the connect functions.
Previously it was taking a random u8 mod 100, which meant that the
numbers 0-55 were 50% more likely to come up than any others. Make it a
u16 instead, so that the numbers 0-35 are only about 0.15% more likely.
Previously they were a flat list intermixing human-readable names and
CPE strings. Now they reflect the structure that we use to represent
them. In brief:
host.os = {
{
name = "Microsoft Windows XP",
classes = {
{
vendor = "Microsoft",
osfamily = "Windows",
osgen = "XP",
type = "general purpose",
cpe = {
"cpe:/o:microsoft:windows_xp"
}
},
... more classes ...
},
},
... more OS matches ...
}