July 2005

You are currently browsing the articles from WhoisIreland Review written in the month of July 2005.

How Big Is The Web?

How big is the web? It sounds like one of those existential questions for philosophers but the answer is strange and ever changing. Google claims to search 8,058,044,651 pages. No doubt other search engines claim to search billions of pages too.

Over the last few weeks, WhoisIreland.com has been developing a new global service covering the history of every hoster in com/net/org/biz/info. Part of the project involved checking every website in com/net/org/biz/info to build an IP map of the web. The results of checking all the websites in the commercial gTLDs show some unusual patterns.

The Coming Soon/Parked web servers of the major hosters and registrars tend to stand out. The IPs of these servers tend to show up as hosting thousands of sites. In some cases, it is actually millions of parked domains. With nearly 40 million .com domains, a significant percentage of them are parked.

The typical figure for domain utilisation in the gTLDs is around 70%. In plain English, that means that 70% of domains in com/net/org/biz/info are being used for something. The problem is that many of these studies do not explain what that “something” is. It could be websites. It could be mailservers. It could be just being parked. But in web terms, the number of active websites, with content, could be lower than 70%.

Google’s approximate counts for webpages in the commercial gTLDs and the number of domains in each gTLD are below:

The figures above are for the main commercial gTLDS and exclude the .edu, .mil and .name TLDs. Of course there is another question - how much of the content on these pages is unique and usable. That is a qualitative problem rather than a quantitative problem. Or as the quote from the TV sitcom “Father Ted” would have it: “That would be an ecumenical question.”

Tags: - - - - -

Written by John McCormac on July 31st, 2005 with comments disabled.
Read more articles on Domains And Statistics.

EURid To Stifle .eu Speculation

In a move that seems contrary to the whole registrar/reseller model on which the internet runs, the bozos in Brussels (The European Commission) have decided that only registrars who have forked up the 10,000 Euros up front payment will be allowed to sell .eu domains. But this is an advance payment for the registration of 1000 domains rather than a fee to become a registrar. Michele Neylon blogged about it but the .eu gTLD does not seem to have registered with most hosters in Ireland.

The following clarification, quoted below, appeared on the EURid website:

“Reselling” of .eu domain name services
22 Jul 2005

Important notice concerning the “reselling” of .eu domain name services

A consultation with the European Commission services has led towards a clear position concerning the offering of so called “reseller” services for .eu domain names.

Regulation 874/2004 of the European Commission laying down the public policy rules concerning the .eu Top Level Domain states clearly that only registrars accredited by the Registry (EURid) shall be permitted to offer registration services for .eu domain names (see article 4 of the regulation). This means that the offering of services as a “reseller” (as a kind of subcontractor of an accredited registrar or as an intermediary without having concluded an agreement with the Registry in order to become an accredited registrar) is completely excluded.

EURid advises to check at all times if your service provider appears in the list of accredited registrars. Only companies and undertakings which appear in that list have the authorisation to offer .eu domain name services. ”

Since the .eu gTLD has not launched yet, it is only possible to speculate on what effect that this will have on the new gTLD. Selling 1000 .eu domains in the first few months of launch is possible. But it is only possible for the top hosting companies in Ireland. The ISPs may have to sit this one out as they seem to just keep losing clients to the second generation hosters. The big question is whether people will adopt the .eu gTLD. It has a lot of competition.

As a speculative market, the .com gTLD took off a few years ago when it was effectively deregulated. Prior to that, Network Solutions had been the registry and sole registrar for .com gTLD, .net gTLD and .org gTLD. The price of a domain went from free (the good old days) to to $100, $110, and to $70. When other registars were permitted to sell .com/net/org, the prices dropped dramatically. The effect of that price drop coupled with the whole dot.bomb bubble drove the speculation in domain names. Eager speculators saw good domains changing hands for millions or at least thousands. And unlike the dot.bomb bubble, domain name speculation is still rife in .com gTLD.

The .info and .biz gTLDs are still small. Many of the registrations in .info are freebies - domains given to holders of the equivalent .com domains by registrars eager to promote the gTLD. The core count of .info domains could be around the 1.5 million mark rather than the 3.6 million or so .info domains registered. The .biz gTLD has around 1.2 million domains. The .com gTLD on the other hand has around 39.8 domains registered. While .eu may have a large market, it is this existing market that it has to overcome.

So what drives the growth in the .com market? The big registrars have millions of domains registered but it is the resellers with their own branding that account for many of these domains - exactly the kind of registrar/reseller model that the bozos in Brussels want to ignore. But .eu will be an unregulated gTLD and it will be open to just the same kind of cybersquatting and speculation that inflates other gTLDs. But in a few years time will .eu be a competitor to .com or will it be a case of dot who?

Tags: - - -

Written by John McCormac on July 30th, 2005 with comments disabled.
Read more articles on Domains And Statistics.

Content Filter Company Scraping Around?

Last year, Secure Computing Corporation claimed that .ie ccTLD had tens of thousands of pages of iffy content. It claimed to have done a “global study” of the number of porn pages on the web and it found that the ccTLDs were riddled with the stuff. They had millions of pages of it. Of course .com/net/org/biz/info websites were not included in this “study”. This “global study” amounted to nothing more than entering a few obviously dodgy keywords into Google and limiting the results by using site:.cctld.

It was a very crude attempt by SCC to market its content filtering software. ENN ran it without question but later corrected the article after getting the headline “Study reveals 60,000 Irish porn sites” seriously wrong. (It was 68000 webpages rather than websites.) Silicon Republic did a good analysis.

So what has this got to do with the present day? Well it seems that an IP from SCC has been sniffing around on some Irish sites with an incompetently forged browser User Agent: Microsoft_Internet_Explorer_5.00.438 . Perhaps a new press release on another dubious “study” should be expected. I wonder if this time, editors will be so eager to run SCC’s claims without verification.

Tags: - - -

Written by John McCormac on July 28th, 2005 with comments disabled.
Read more articles on Search Engines.

Microsoft Sues Google Over Competition

Microsoft does not like competition. It either assimilates it or crushes it. But with Google, it may have found its match. A key player in Microsoft’s search operation apparently defected to Google. Microsoft is suing. Dr Kai-Fu Lee was corporate vice president of Microsoft’s Interactive Services Division. Google wants him to head its China operation.

The News.com article has some interesting items from the law suit. It states that Dr Kai-Fu Lee had been “responsible for overall development of the MSN Internet search application.” and had been involved in Microsoft’s China strategy.

The general reaction to Microsoft’s search offering has been mixed. Some people think it is good for Google to have competition. On country level search (specifically Ireland) , it does not seem to move much beyond the simplistic country IP/country code TLD (ccTLD) grouping of websites. In this respect it seems that Microsoft, like Google cannot identify websites from specific countries hosted outside that country’s IPs/ccTLDs - a problem that WhoisIreland.com has solved.

Ultimately this law suit may not be about the the search business. It may really be about China as a future market for Microsoft. Like all great Empires, Microsoft needs to continue expanding. China may be crucial to the survival of Microsoft. However Google also needs to keep expanding.

Tags: - - - - -

Written by John McCormac on July 22nd, 2005 with comments disabled.
Read more articles on Search Engines.

Metadata - A great idea?

Metadata is a great idea. It will be even better when websites actually use it on a widespread basis. The statistics for .ie websites show how poorly meta data is used:

Websites With Title, Keywords and Description: 10460

That’s out of approximately 36198 .ie websites. There are significant opportunities for Search Engine Optimisation companies in Ireland, if only the owners of the websites can be convinced of the importance of SEO and its effects.

Metadata was great in the 1990s. Back then technology was expensive and it was a lot cheaper and easier for search engines to strip the meta data from a page and use it instead of the body text of the page. The falling cost of harddrive space and processing power changed all that and made it possible for search engines to cache complete copies of the webpages and implement better searching algorithms.

Google and its link based algorithm changed the emphasis from metadata to link structure. It was quite innovative but turned out to be as easily gamed as many other algorithms.

Now there is talk of the Semantic Web being the next big thing in search. Again this is another nice theoretical solution that ignores the reality of the patchy nature of the web. These “solutions” have to be easy to use. They have to be integrated an almost organic level in the web design programs.

It is over ten years since the appearance of the web’s first major search engines. Full metadata is still not included in the bulk of webpages. So what hope is there for the Semantic Web? Will it end up being an academic exercise in futility?

Tags: - - -

Written by John McCormac on July 19th, 2005 with comments disabled.
Read more articles on Search Engines.

Local Search - Defining “Near”

Local search is more than just matching websites to a location. A few years ago, I did a lot of research on local search, theorising and also building experimental local search engines. One of them was a mobile phone based search engine. It was perhaps a bit more advanced than a simplistic SMS based query search engine interface in that it took the user’s location into account in generating the results.

The quoting of an entire Google labs newsgroups post of mine from around that time by the operator of the searchtheowl website in a post on his blog shows how badly understood “local search” is, even today.

Localised and Local search is more than just stuffing a pile of URLs in a database and claiming that they are local because they are in the same country or even in the same county. The problem with local search is that the user wants to know what websites or resources are “near” to them. It is the definition of the term “near” that is at the heart of local search.

Tags: - - -

Written by John McCormac on July 16th, 2005 with comments disabled.
Read more articles on Search Engines.

Search Engines Moving On Blogs?

Steve Rubel’s Micro Persuasion blog came across Yahoo’s test RSS search and posted images on this post. Yahoo pulled the test site. But it is interesting that Yahoo is taking RSS search so seriously.

The large search engines (Google, Yahoo, MSN) have been slowly evaluating adding blog search to their indices. Some like Google have incorporated a lot of blogs in their live index. Yahoo and MSN have also been busy. This
Business Week article
outlines some of the background.

Tags: - -

Written by John McCormac on July 16th, 2005 with comments disabled.
Read more articles on Search Engines.

Mapping The Web

One of the current projects at WhoisIreland.com is building a country based map of the web. The main datasets are the com/net/org/biz/info gTLDs. Essentially it means building an IP based map of the websites associated with approximately 54 million domains.

From a purely computational viewpoint, it cannot be thought about in numerical terms. It has to be thought about in abstract mathematical terms. The small gTLDs such as .info and .biz only take a few days to map. But the larger gTLDs take longer.

This is the approximate size of the problem:

.com 39.5 Million domain names.
.net 6.1 Million domain names.
.org 3.7 Million domain names.
.info 3.64 Million domain names.
.biz 1.2 Million domain names.

On any day, there can be upwards of 450K new domains and 450K deleted domains. The utilisation for the gTLDs is around 70%. That means that approximately 70% of the domains are active. But that figure drops once the on-hold and parked domains are removed from the dataset. The .info gTLD was articifically inflated by the addition of “free” .info domains to owners of existing .com domains. In real terms, it is slightly bigger than the size of the .biz gTLD.

Many of these domains, perhaps as many as 30% are speculative and are either parked on a hosting company’s servers or are not properly set up. There is no integrity checking in these gTLDs so it is not unusual to see a nameserver on an IP that does not exist. Microscopic country code TLDs like .ie have integrity checks built into the system - they check to see if the nameservers are answering for the domain prior to including the domain in the ccTLD zonefile. However the size of the gTLDs make such prior checking impractical.

So what can be done with all this data? The obvious answer is that it can provide a good starting point for the “Ghosthunter Algorithm” mentioned previously. It also provides raw search engine indices for every country with a presence on the net.

Tags: - -

Written by John McCormac on July 16th, 2005 with 2 comments.
Read more articles on Domains And Statistics.

« Older articles

No newer articles