Icons of the Web
Update: We did it again! Check out how much changed in 3 years in the Favicon Project 2013.
A large-scale scan of the top million web sites (per Alexa traffic data) was performed in early 2010 using the Nmap Security Scanner and its scripting engine. As seen in the New York Times, Slashdot, Gizmodo, Engadget, and Telegraph.co.uk ...
We retrieved each site's icon by first parsing the HTML for a link tag and then falling back to /favicon.ico if that failed. 328,427 unique icons were collected, of which 288,945 were proper images. The remaining 39,482 were error strings and other non-image files. Our original goal was just to improve our http-favicon.nse script, but we had enough fun browsing so many icons that we used them to create the visualization below.
The area of each icon is proportional to the sum of the reach of all sites using that icon. When both a bare domain name and its "www." counterpart used the same icon, only one of them was counted. The smallest icons--those corresponding to sites with approximately 0.0001% reach--are scaled to 16x16 pixels. The largest icon (Google) is 11,936 x 11,936 pixels, and the whole diagram is 37,440 x 37,440 (1.4 gigapixels). Since your web browser would choke on that, we have created the interactive viewer below (click and drag to pan, double-click to zoom, or type in a site name to go right to it).
We have only printed 15 posters (for Nmap developers) so far, but we're considering an offset print run if there is enough demand. If you might be interested in buying a physical copy of the poster, please fill out this short form:
For downloads of programs and data files, go to this page.
- Why are some sites not found?
- There are a few possible causes. First, the site may not have been among the top million at the time the survey was done. Check the data file to see if it was present. Second, the site may have changed its icon since the survey was done. This page downloads the current icon of the site you type in, and looks up its hash in a database. Failing that, it will look up the site name in the database, but that only works if you use the exact same name we did when doing the survey. Third, it's possible that the site timed out or didn't have an icon at the time of the survey. Fourth, this page limits the size of the icons it will download. If an icon file is too big, it won't be found. Calculate the MD5 sum of the icon yourself and enter it in the search box.
- Why are some icons (Amazon, Bing, Baidu) so small?
- This usually indicates that the main site timed out during the survey, and only less popular sites using the same icon responded. In other words, it represents a data collection error. For example, baidu.com didn't respond, but baidu.hk and baidu.jp did, and so what would have been one of the biggest icons is instead small. See this page for more technical details and caveats. We didn't fudge the data after the survey or attempt to fill in any obviously "missing" icons.
- Why are there two "Я" Yandex icons?
- Look closely. The icons are different. The uniqueness of icons is based on their MD5 hash, so even icons that are visually identical may in fact be different. Remember, the original impetus for this scan was to improve the hash database of an Nmap Scripting Engine script.
Programming and design was done by David Fifield and scanning performed by Brandon Enright.