A Lot Of The Web Is Not There
Over the last few weeks, WhoisIreland.com has been checking all websites in .com/net/org/info/biz/ie. The initial results are quite surprising. A lot of the web just is not there. Or to be more precise, it is coming soon. These “coming soon” websites tend to share the same IP. In some cases the IP of one of these “coming soon” websites can have millions of associated websites. The smaller ones have thousands.
Another interesting aspect is that the number of distinct IPs is far smaller than first expected. With approximately 54 million domains in com/net/org/biz/info/ie, the number distinct IPs of the associated websites is probably less than a tenth of that number.
The hard part of the work begins now - crunching this data to provide usable results. The Ghosthunter Algorithm should show up a lot of the hidden Irish websites. These are websites on servers outside of Irish IP space and identified Irish hosters. A simple test detected two Irish hosters (IEDR resellers) who use UK and US nameservers and IPs. It should provide a superior irish search engine index. The same algorithm can be applied to other countries. The algorithm itself uses a number of elements to go beyond the simplistic IP/ccTLD based categorisation that Google/Yahoo/Microsoft use to generate their country level search indices.
Tags: Irishblogs - Google - Semantic Web - Irish Search Engines- Domain Statistics- Web Statistics
Written by John McCormac on August 22nd, 2005 with
comments disabled.
Read more articles on Domains And Statistics.
- [+] Digg: Feature this article
- [+] Del.icio.us: Bookmark this article
- [+] Furl: Bookmark this article