Nmap's algorithm for detecting matches is relatively simple. It
takes a subject fingerprint and tests it
against every single reference fingerprint in
When testing against a reference fingerprint, Nmap looks at each
probe category line from the subject fingerprint (such as
T1) in turn. Any probe
lines which do not exist in the reference
fingerprint are skipped. When the reference fingerprint does have a
matching line, they are compared.
For a probe line comparison, Nmap examines every individual test
etc.) from the subject category line in turn. Any tests which do
not exist in the reference line are skipped.
Whenever a matching test is found, Nmap increments the
PossiblePoints accumulator by the number of points
assigned to this test. Then the test values are compared. If the
reference test has an empty value, the subject test only matches if
its value is empty too. If the reference test is just a plain string
or number (no operators), the subject test must match it exactly. If
the reference string contains operators (
<), the subject must match as described in the section called “Test expressions”. If a test matches, the
NumMatchPoints accumulator is incremented by the
test's point value.
Once all of the probe lines are tested for a fingerprint, Nmap
PossiblePoints. The result is a confidence factor
describing the probability that the subject fingerprint matches that
particular reference fingerprint. For example,
1.00 is a perfect match while
0.95 is very close (95%).
Test point values are assigned by a special
MatchPoints entry (which may only appear once) in
This entry looks much like a
normal fingerprint, but instead of providing results for each test, it
provides point values (non-negative integers) for each test. Tests
listed in the
MatchPoints structure only apply when
found in the same test they are listed in. So a value given for the
W (Window size) test in
doesn't affect the
W test in
A test can be effectively disabled by assigning it a point value of 0.
MatchPoints structure is given in Example 8.10.
Example 8.10. The
Once all of the reference fingerprints have been evaluated, Nmap
orders them and prints the perfect matches (if there aren't too many).
If there are no perfect matches, but some are very close, Nmap may
print those. Guesses are more likely to be printed if the
option is given.
IPv6 OS classification uses a machine learning technique called logistic
regression. Nmap uses the
library to do this classification. The process starts with a large
corpus of training examples, which are fingerprints submitted by Nmap
users and carefully labeled with their OS. Each training example is
represented by a feature vector, which can be thought of as the
“coordinates” of that OS in a multi-dimensional space. The
training algorithm calculates an optimal boundary between members of
each OS class and members of every other class. It then encodes each of
these boundaries as a vector. There is a different vector for each OS
When matching, the engine takes each of these boundary vectors in turn
and calculates a dot product between it and the feature vector. The
result is a single real number. The higher (more positive) the number,
the more likely the match. Negative numbers are unlikely matches. A
number x is mapped from the range
[−∞, ∞] to [0, 100] using the logistic formula
100 / (1 + ex).
(This is the source of the name “logistic regression”.)
In general, the OS class with the highest score is the most likely
match, but in the case of a never-before-seen operating system, it's
possible to have a very high score but an inaccurate match nevertheless.
Therefore a second
algorithm checks whether the observed fingerprint is very unlike the
other representatives of the class. The algorithm finds the Euclidean
distance from the observed feature vector to the mean of the feature
vectors of the members of the class, scaled in each dimension by the
inverse of that feature's variance. Feature vectors similar to those
already seen will have low novelty, and those that are different will
have high novelty.
The OS class with the highest score is reported as a match, but only if
the novelty is below 15. Also, if the two highest OS classes have scores
that differ by less than 10%, the classification is considered ambiguous
and not a successful match.
Sample logistic and novelty scores from a run against Mac OS X 10.6.8
are shown in Table 8.9, “OS guesses against Mac OS X”.
Table 8.9. OS guesses against Mac OS X
|61.05%||1.00||Apple Mac OS X 10.6.8 - 10.7.0 (Snow Leopard - Lion) (Darwin 10.8.0 - 11.0.0)|
|10.08%||18.04||Apple Mac OS X 10.7 (Lion) (Darwin 11.1.0)|
|9.97%||24.06||Apple Mac OS X 10.6.8 (Snow Leopard) (Darwin 10.8.0)|
|e)|| || |
|9.43%||19.26||Apple Mac OS X 10.7.2 (Lion) (Darwin 11.2.0)|
|5.99%||23.63||Apple Mac OS X 10.4.11 (Tiger) (Darwin 8.11.1)|
|2.28%||34.67||Apple iPhone mobile phone (iOS 4.2.1)|
|2.19%||35.07||Apple Mac OS X 10.4.7 (Panther) (Apple TV 3.0.2)|
|2.19%||57.63||HP ProCurve 2520G switch|
|2.04%||37.03||Apple Mac OS X 10.6.8 (Snow Leopard) (Darwin 10.8.0)|
|2.03%||68.55||Apple Mac OS X 10.6.8 (Snow Leopard) (Darwin 10.8.0)|
There isn't a separate data file containing the IPv6 OS database as
there is with IPv4. The database is stored in C++ source code file
FPModel.cc. This file contains scaling constants
(used to put feature values roughly into the range [0, 1]), and
the boundary vectors described above.