A large-scale scan of the top million web sites (per Alexa traffic data) was
performed in early 2010 using the Nmap
Security Scanner and its scripting engine. As seen in the
New York Times,
We retrieved each site's icon by first parsing
the HTML for a link tag and then falling back to /favicon.ico if that
failed. 328,427 unique icons were collected, of which 288,945 were
proper images. The remaining 39,482 were error strings and other
non-image files. Our original goal was just to improve our http-favicon.nse
script, but we had enough fun browsing so many icons that we used them
to create the visualization below.
The area of each icon is proportional to the sum of the reach of
all sites using that icon. When both a bare domain name and its
"www." counterpart used the same icon, only one of them was counted.
The smallest icons--those corresponding to sites with approximately
0.0001% reach--are scaled to 16x16 pixels. The largest icon (Google)
is 11,936 x 11,936 pixels, and the whole diagram is 37,440 x 37,440
(1.4 gigapixels). Since your web browser would choke on that, we have
created the interactive viewer below (click and drag to pan,
double-click to zoom, or type in a site name to go right to it).
10000 bytes in 0.01 seconds.
15086 bytes in 0.00 seconds.
Online lookup: Not found.
Survey database lookup: Not found.
Why not found? See the FAQ for more information.
The graphic has been made into a 24x36 inch poster. Click to see a larger version.
We have only printed 15 posters (for Nmap developers) so far, but
we're considering an offset print run if there is enough demand. If
you might be interested in buying a physical copy of the poster,
please fill out this short form:
For downloads of programs and data files, go to
- Why are some sites not found?
- There are a few possible causes. First, the site may not have been
among the top million at the time the survey was done.
the data file to see if it was present. Second, the site may have
changed its icon since the survey was done. This page downloads the
current icon of the site you type in, and looks up its hash in a
database. Failing that, it will look up the site name in the database,
but that only works if you use the exact same name we did when doing the
survey. Third, it's possible that the site timed out or didn't have an
icon at the time of the survey. Fourth, this page limits the size of the
icons it will download. If an icon file is too big, it won't be found.
Calculate the MD5 sum of the icon yourself and enter it in the search
- Why are some icons (Amazon, Bing, Baidu) so small?
- This usually indicates that the main site timed out during the
survey, and only less popular sites using the same icon responded. In
other words, it represents a data collection error. For example,
baidu.com didn't respond, but baidu.hk and baidu.jp did, and so what
would have been one of the biggest icons is instead small.
page for more technical details and caveats. We didn't fudge the
data after the survey or attempt to fill in any obviously "missing"
- Why are there two "Я" Yandex icons?
- Look closely. The icons are different. The uniqueness of icons is
based on their MD5 hash, so even icons that are visually identical may
in fact be different. Remember, the original impetus for this scan was
to improve the hash database of an
Nmap Scripting Engine
Programming and design was done by David Fifield and
scanning performed by Brandon Enright.