diff --git a/docs/scripting.xml b/docs/scripting.xml index d450e8f49..23629ef81 100644 --- a/docs/scripting.xml +++ b/docs/scripting.xml @@ -1636,100 +1636,6 @@ socket:close() - - Thread Mutexes - threads in NSE - mutexes in NSE - - Each script execution thread (e.g. ftp-anon running against an FTP server on the target host) yields to other - scripts whenever it makes a call on network objects (sending or receiving - data). Some scripts require finer concurrency control over thread execution. An - example is the whois script which queries - whoiswhois - servers for each target IP address. Because many concurrent queries often result in - getting one's IP banned for abuse, and because a single query may return additional - information for targets other threads are running against, it is useful - to have other threads pause while one thread performs a query. - - - To solve this problem, NSE includes a - mutex function which provides a - mutex - (mutual exclusion object) usable by scripts. The mutex allows - for only one thread to be working on an object. Competing threads - waiting to work on this object are put in the waiting queue - until they can get a "lock" on the mutex. A solution for - the whois problem above is to have each - thread block on a mutex using a common string, thus ensuring - that only one thread is querying whois servers at once. That - thread can store the results in the NSE registry before - releasing unlocking the mutex. The next script in the waiting - queue can then run. It will first check the registry and only - query whois servers if the previous results were insufficient. - - - The first step is to create a mutex object using a statement such as: - - mutexfn = nmap.mutex(object) - - The mutexfn returned is a function - which works as a mutex for the object passed - in. This object can be any - Lua data - type except nil, - booleans, and numbers. - The returned function allows you to lock, try to lock, and - release the mutex. Its first and only parameter must be one of - the following: - - - - - "lock" - Make a blocking lock on the mutex. If the mutex is busy (another thread has a lock on it), then the thread will yield and wait. The function returns with the mutex locked. - - - - "trylock" - Makes a non-blocking lock on the mutex. If the mutex is - busy then it immediately returns with a return value of - false. Otherwise the mutex locks the - mutex and returns true. - - - - "done" - Releases the mutex and allows - another thread to lock it. If the thread does not have a lock on the mutex, an - error will be raised. - - - - "running" - Returns the thread locked - on the mutex or nil if the mutex is not - locked. This should only be used for debugging as it - interferes with garbage collection of finished threads. - - - -A simple example of using the API is provided in . For real-life examples, read the asn-query.nse and whois.nse scripts in the Nmap distribution. - - - Mutex manipulation - -local mutex = nmap.mutex("My Script's Unique ID"); -function action(host, port) - mutex "lock"; - -- Do critical section work - only one thread at a time executes this. - mutex "done"; - return script_output; -end - - - - Exception Handling exceptions in NSE @@ -2365,6 +2271,547 @@ categories = {"discovery", "external"} + + Script Parallelism in NSE + + Before now, we have only lightly touched on the steps NSE takes to allow + multiple scripts to execute in parallel. Usually, the author need not + concern himself with how any of this is implemented; however, there are a + couple cases that warrant discussion that we will cover in this section. + As a script writer, you may need to control how multiple scripts interact + in a library; you may require multiple threads to work in parallel; or + perhaps you need to serialize access to a remote resource. + + + The standard mechanism for parallel execution is a thread. A thread + encapsulates execution flow and data of a script using the Lua + thread or coroutine. A Lua thread + allows us to yield the current script at arbitrary points to continue + work on another script. Typically, these yield points are blocking calls + to the NSE Socket library. The yield back to NSE is also transparent; the + script is unaware of the transition and views each socket method as a + blocking call. + + + Let's go over some common terminology. A script is + analogous to a binary executable; it holds the information necessary to + execute our script. A thread (a Lua coroutine) is + analogous to a process; it runs a script against a host and possibly + port. We sometimes abuse our terminology throughout the book by referring + to a thread as a running script. We are really saying the "instantiation + of the script", in the same sense that a process is the instantiation of + an executable. + + + NSE provides the bare-bone essentials you need to expand your degree + of parallelism beyond the basic script thread: new independent threads, + Mutexes, and Condition Variables. We will go into depth on each of + these mechanisms in the following sections. + + + Worker Threads + + There are several instances where a script needs finer control with + respect to parallel execution beyond what is offered by default with a + generic script. The common reason for this need is the inability for a + script to read from multiple sockets concurrently. For example, an HTTP + spidering script may want to have multiple Lua threads querying web + server resources in parallel. To solve this problem, NSE offers the + function stdnse.new_thread to create worker threads. + These worker threads have all the power of independent scripts with the + only restriction that they may not report Script Output. + + + Each worker thread launched by a script is given a main function and + a variable number of arguments to be passed to the main function by + NSE: + + + worker_thread, status_function = stdnse.new_thread(main, ...) + + + You are given back the Lua thread (coroutine) that uniquely identifies + your worker thread and a status query function that queries the status + of your new worker. + + + The status query function returns two values: + + + status, error_object = status_function() + + + The first return value, status, is simply the return + value of coroutine.status on the worker thread + coroutine (more precisely, the base coroutine, read + more about base coroutine in ). The second return value contains + the error object thrown that ended the worker thread or + nil if no error was thrown. This object is typically + a string, like most Lua errors. However, recall that any Lua type can + be an error object, even nil! You should + inspect the error object, the second return value, only if the status + of your worker is "dead". + + + NSE discards all return values from the main function when the worker + thread finishes execution. You should communicate with your worker + through the use of main function parameters, + upvalues, or function environments. You will see how to do this in + . + + + Finally, when using worker threads you should always use condition + variables and Mutexes to coordinate with your worker threads. Keep in + mind that Nmap is single threaded so there are no (memory) issues in + synchronization to worry about; however, there is resource + contention. Your resources are usually network bandwidth, network + sockets, etc. Condition variables are also useful if the work for any + single thread is dynamic. For example, a web server spider script with + a pool of workers will initially have a single root html document. + Following the retrieval of the root document, the set of resources to + be retrieved (the worker's work) will become very large (an html + document adds many new hyperlinks (resources) to fetch). + + + Worker Thread Example + +local requests = {"/", "/index.html", --[[ long list of objects ]]} + +function thread_main (host, port, responses, ...) + local condvar = nmap.condvar(responses); + local what = {n = select("#", ...), ...}; + local allReqs = nil; + for i = 1, what.n do + allReqs = http.pGet(host, port, what[i], nil, nil, allReqs); + end + local p = assert(http.pipeline(host, port, allReqs)); + for i, response in ipairs(p) do responses[#responses+1] = response end + condvar "signal"; +end + +function many_requests (host, port) + local threads = {}; + local responses = {}; + local condvar = nmap.condvar(responses); + local i = 1; + repeat + local j = math.min(i+10, #requests); + local co = stdnse.new_thread(thread_main, host, port, responses, + unpack(requests, i, j)); + threads[co] = true; + i = j+1; + until i > #requests; + repeat + condvar "wait"; + for thread in pairs(threads) do + if coroutine.status(thread) == "dead" then threads[thread] = nil end + end + until next(threads) == nil; + return responses; +end + + + + For brevity, this example omits typical behavior of a traditional web + spider. The requests table is assumed to contain a number of objects + (hundreds or thousands) to warrant the use of worker threads. Our + example will dispatch a new thread with 11 relative + Uniform Resource Identifiers (URI) to request, up to the length of the + requests table. Worker threads are very cheap so we + are not afraid to create a lot of them. After we dispatch this large + number of threads, we wait on our Condition Variable until every thread + has finished then finally return the responses table. + + + You may have noticed that we did not use the status function returned + by stdnse.new_thread. You will typically use this + for debugging or if your program must stop based on the error thrown by + one of your worker threads. Our simple example did not require this but + a fault tolerant library may. + + + + Thread Mutexes + threads in NSE + mutexes in NSE + + Recall from the beginning of this section that each script execution + thread (e.g. ftp-anon running against an FTP server + on a target host) yields to other scripts whenever it makes a call + on network objects (sending or receiving data). Some scripts require + finer concurrency control over thread execution. An example is the + whois script which queries + whoiswhois servers for each + target IP address. Because many concurrent queries often result in + getting one's IP banned for abuse, and because a single query may + return additional information for targets other threads are running + against, it is useful to have other threads pause while one thread + performs a query. + + + To solve this problem, NSE includes a mutex function + which provides a mutex + (mutual exclusion object) usable by scripts. The Mutex allows for only + one thread to be working on an object. Competing threads waiting to + work on this object are put in the waiting queue until they can get a + "lock" on the Mutex. A solution for the whois + problem above is to have each thread block on a Mutex using a common + string, thus ensuring that only one thread is querying whois servers at + once. When finished querying the remote servers, the thread can store + results in the NSE registry and unlock the Mutex. Other scripts waiting + to query the remote server can then obtain a lock, check for usable + results retrieved from previous queries, make their own queries, and + unlock the Mutex. This is a good example of serializing access to a + remote resource. + + + + The first step in using a Mutex is to create one via a call to the + nmap library: + + + mutexfn = nmap.mutex(object) + + + The mutexfn returned is a function which works as a + Mutex for the object passed in. This object can be + any Lua data + type except nil, + booleans, and numbers. The + returned function allows you to lock, try to lock, and release the + Mutex. Its first and only parameter must be one of the + following: + + + + + "lock" + + + Make a blocking lock on the Mutex. If the Mutex is busy (another + thread has a lock on it), then the thread will yield and + wait. The function returns with the Mutex locked. + + + + + + "trylock" + + + Makes a non-blocking lock on the Mutex. If the Mutex is busy then + it immediately returns with a return value of + false. Otherwise the Mutex locks the Mutex and + returns true. + + + + + + "done" + + + Releases the Mutex and allows another thread to lock it. If the + thread does not have a lock on the Mutex, an error will be + raised. + + + + + + "running" + + + Returns the thread locked on the Mutex or nil + if the Mutex is not locked. This should only be used for + debugging as it interferes with garbage collection of finished + threads. + + + + + + + NSE maintains a weak reference to the Mutex so other calls to + nmap.mutex with the same object will return the same + function (Mutex); however, if you discard your reference to the Mutex + then it may be collected; and, subsequent calls to + nmap.mutex with the object will return a different + Mutex function! Thus you should save your Mutex to a (local) variable + that persists for the entire time you require. + + + + A simple example of using the API is provided in . For + real-life examples, read the asn-query.nse and + whois.nse scripts in the Nmap + distribution. + + + + Mutex manipulation + +local mutex = nmap.mutex("My Script's Unique ID"); +function action(host, port) + mutex "lock"; + -- Do critical section work - only one thread at a time executes this. + mutex "done"; + return script_output; +end + + + + + Condition Variables + + Condition Variables arose out of a need to coordinate with worker + threads created using the stdnse.new_thread + function. A Condition Variable allows one or more threads to wait on + an object and one or more threads to awaken one or all threads waiting + on the object. Said differently, multiple threads may unconditionally + block on the Condition Variable by + waiting. Other threads may wake up one or all of + the waiting threads via signalling the Condition + Variable. + + + + As an example, we may dispatch multiple worker threads that will + produce results for us to use, like our earlier . Until all + the workers finish, our master thread must sleep. Note that we cannot + poll for results like in a traditional Operating + System thread because NSE does not preempt Lua threads. Instead, + we use a Condition Variable that the master thread + waits on until awakened by a worker. The master + will continually wait until all workers have terminated. + + + + The first step in using a Condition Variable is to create one via a + call to the nmap library: + + + condvarfn = nmap.condvar(object) + + + The semantics for Condition Variables are similar to Mutexes. The + condvarfn returned is a function which works as a + Condition Variable for the object passed in. This + object can be any Lua data + type except nil, + booleans, and numbers. The + returned function allows you to wait, signal, and broadcast on the + Condition Variable. Its first and only parameter must be one of the + following: + + + + + "wait" + + + Wait on the Condition Variable. This adds your thread to the + waiting queue for the Condition Variable. You will resume + execution when another thread signals or broadcasts on the + Condition Variable. + + + + + "signal" + + + Signal the Condition Variable. A thread in the Condition + Variable's waiting queue will be resumed. + + + + + "broadcast" + + + Signal all threads in the Condition Variable's waiting + queue. + + + + + + + Like with Mutexes, NSE maintains a weak reference to the Condition + Variable so other calls to nmap.condvar with the + same object will return the same function (Condition Variable); + however, if you discard your reference to the Condition Variable then + it may be collected; and, subsequent calls to + nmap.condvar with the object will return a different + Condition Variable function! Thus you should save your Condition + Variable to a (local) variable that persists for the entire time you + require. + + + + When using Condition Variables, it is important to check the predicate + before and after waiting. A predicate is a test on whether to continue + doing work within your worker or master thread. For your worker + threads, this will at the very least include a test to see if the + master thread is still alive. You do not want to continue doing work + when no thread will use your results. A typical test before waiting + may be: check whether the master is still running, if not then quit; + check that there is work to be done; if not then wait. + + + + NSE does not guarantee spurious wakeups will not occur; that is, there + is no guarantee your thread will not be awakened when no thread called + "signal" or "broadcast" on the + Condition Variable. The typical, but not only, reason for a spurious + wakeup is the termination of a thread using a Condition Variable. This + is an important guarantee NSE makes that allows you to avoid deadlock + where a worker or master waits for a thread to wake them up that ended + without signaling the Condition Variable. + + + + Collaborative Multithreading + + One of Lua's least known features is collaborative multithreading + through coroutines. A coroutine provides an + independent execution stack that is resumable. + The standard coroutine provides access to the + creation and manipulation of coroutines. Lua's online first + edition of Programming in + Lua contains an excellent introduction to + coroutines. We will provide an overview of the + use of coroutines here for completeness but this is no replacement for + reviewing PiL. + + + + We have mentioned coroutines throughout this section as + threads. This is the type + (thread) of a coroutine in Lua. Users of NSE that + have any parallel programming experience with Operating System threads + may be confused by this. As a reminder, Nmap is single threaded. Lua + threads provide the basis for parallel scripting but only one thread is + ever running at a time. + + + + A Lua function executes on top of a Lua + thread. The thread maintains a stack of active + functions, local variables, and the current instruction. We can switch + between coroutines by explicitly yielding the + running thread. The coroutine which resumed the + yielded thread resumes operation. + + shows a brief use of coroutines to print numbers. + + + Basic Coroutine Use + +local function main () + coroutine.yield(1) + coroutine.yield(2) + coroutine.yield(3) +end +local co = coroutine.create(main) +for i = 1, 3 do + print(coroutine.resume(co)) +end +--> true 1 +--> true 2 +--> true 3 + + + + + What you should take from this example is the ability to transfer + between flows of control extremely easily through the use of + coroutine.yield. This is an extremely powerful + concept that enables NSE to run scripts in parallel. All scripts are + run as coroutines that yield whenever they make a blocking socket + function call. This enables NSE to run other scripts and later resume + the blocked script when its I/O operation has completed. + + + + As a script writer, there are times when coroutines are the best + tool for a job. One common use in socket programming is to filter + data. You may produce a function that generates all the links from an + HTML document. An iterator using string.gmatch + only catchs a single pattern. Because some complex matches may take + many different Lua patterns, it is more appropriate to use a + coroutine. + + shows how to do this. + + + + Link Generator + +function links (html_document) + local function generate () + for m in string.gmatch(html_document, "url%((.-)%)") do + coroutine.yield(m) -- css url + end + for m in string.gmatch(html_document, "href%s*=%s*\"(.-)\"") do + coroutine.yield(m) -- anchor link + end + for m in string.gmatch(html_document, "src%s*=%s*\"(.-)\"") do + coroutine.yield(m) -- img source + end + end + return coroutine.wrap(generate) +end + +function action (host, port) + -- ... get HTML document and store in html_document local + for link in links(html_document) do + links[#links+1] = link; -- store it + end + -- ... +end + + + + + There are many other instances where coroutines may provide an + easier solution to a problem. It takes experience from use to help + identify those cases. + + + + The Base Thread + + Because scripts may use coroutines for their own multithreading, + it is important to be able to identify an owner + of a resource or to establish whether the script is still alive. + NSE provides the function stdnse.base for this + purpose. + + + Particularly when writing a library that attributes + ownership of a cache or socket to a script, you may use the + base thread to establish whether the script is still running. + coroutine.status on the base thread will give + the current state of the script. In cases where the script is + "dead", you will want to release the resource. + Be careful with keeping references to these threads; NSE may + discard a script even though it has not finished executing. The + thread will still report a status of "suspended". + You should keep a weak reference to the thread in these cases + so that it may be collected. + + + + + Version Detection Using NSE Nmap Scripting Engine (NSE)sample scripts