Mapping The Web

One of the current projects at WhoisIreland.com is building a country based map of the web. The main datasets are the com/net/org/biz/info gTLDs. Essentially it means building an IP based map of the websites associated with approximately 54 million domains.

From a purely computational viewpoint, it cannot be thought about in numerical terms. It has to be thought about in abstract mathematical terms. The small gTLDs such as .info and .biz only take a few days to map. But the larger gTLDs take longer.

This is the approximate size of the problem:

.com 39.5 Million domain names.
.net 6.1 Million domain names.
.org 3.7 Million domain names.
.info 3.64 Million domain names.
.biz 1.2 Million domain names.

On any day, there can be upwards of 450K new domains and 450K deleted domains. The utilisation for the gTLDs is around 70%. That means that approximately 70% of the domains are active. But that figure drops once the on-hold and parked domains are removed from the dataset. The .info gTLD was articifically inflated by the addition of “free” .info domains to owners of existing .com domains. In real terms, it is slightly bigger than the size of the .biz gTLD.

Many of these domains, perhaps as many as 30% are speculative and are either parked on a hosting company’s servers or are not properly set up. There is no integrity checking in these gTLDs so it is not unusual to see a nameserver on an IP that does not exist. Microscopic country code TLDs like .ie have integrity checks built into the system - they check to see if the nameservers are answering for the domain prior to including the domain in the ccTLD zonefile. However the size of the gTLDs make such prior checking impractical.

So what can be done with all this data? The obvious answer is that it can provide a good starting point for the “Ghosthunter Algorithm” mentioned previously. It also provides raw search engine indices for every country with a presence on the net.

Tags: - -

Written by John McCormac on July 16th, 2005 with comments disabled.
Read more articles on Domains And Statistics.

Related articles

2 comments

Comments are now disabled for this article, thank you for your participation. Read the comments left by other users below, or:

Get your own gravatar by visiting gravatar.com John McCormac
#1. July 17th, 2005, at 12:50 AM.

Yep Michele, .de is bigger than .net or .org from what I remember. :)
The integrity checks seem to be a legacy feature for most TLDs. But the integrity checking seems to be done only at registration. There is very little integrity checking after registration apart from a few aperiodic audits.