Search Engines

You are currently browsing the articles from WhoisIreland Review matching the category Search Engines.

.eu - Less than 16% Of Websites Actively Developed?

The figure for active web development in .eu is now close to 16%. I’ve been refining the parsing (classifying holding pages and redirects based on frame src tags, duplicate content checking etc) and the active web figure now stands at 286222 websites out of the initial 1.436M websites. That’s 19.94% of the websites and 16.16% of the total resolving .eu domains. The .eu ccTLD is a disaster zone compared to real ccTLDs. In comparison, the .ie figure is around 57% of websites actively developed - a far higher figure.

There has to be a critical mass of natural web development in an extension to make the extension viable for both business and speculation. It is that natural web development that makes an extension valuable.

The current figures show 112685 websites parked with PPC. That’s 7.85% of the websites and 6.36% of the domains. The aggregators/warehousers/direct navigation networks account for 126257 websites. That’s 8.79% of the websites and 7.13% of the domains. So effectively 15.15% of the websites are PPC monetised - that excludes those using Google Adsense or other webmaster monetisation.
I’m not sure if the uncertainty caused by EURid’s bungling attempts at clamping down on phantom registrars was the problem. The problem was the European Commission awarding contract to run the .eu ccTLD to a ccTLD registry venture with no real gTLD experience. The .eu ccTLD is not really a ccTLD but rather a gTLD. The legal framework was botched as well. If it had specified prior rights and prior use then a lot of the Sunrise problems would not have happened. Some landrush speculators pooled their resources to snap up names of existing European businesses and websites. Many of these domain names were the .eu variants of European small businesses who could not really afford an expensive ADR. These small businesses form the core of any ccTLD.

Many of those domains registered by those phantom registrars are still registered and a lot are framed Sedo parking pages. Others have no nameservers so that they do not appear to be active. There are .eu domains registered with obviously fake addresses and EURid has taken no action for over a year. It seems that EURid management doesn’t care about running .eu as long as it can tell its political masters that the extension is a great success with millions of domains registered.

But grouping all speculators together is dangerous. Some speculators are there to develop websites and provide that essential natural web development growth to the extension. Others are there to flip the domains or monetise the domains with PPC. The opportunity is still there but the audience is not.

It will take years for .eu to recover from the damage caused by EURid’s incompetent handling of the landrush and phantom registrar issues. It may not even recover until after EURid loses the contract to run the ccTLD and the ccTLD is reorganised by people who actually understand the domain name industry.

Tags: ,, , , , , ,

Written by John McCormac on July 17th, 2007 with 4 comments.
Read more articles on Search Engines and Domains And Statistics.

.eu Website Survey - Less Than 22% Actively Developed

WhoisIreland.com surveyed over 1.436 Million .eu websites in June. Less than 22% of these websites were actively developed. This active development figure is likely to be downgraded. Out of approximately 1.77M resolving domains (from 2.15M tracked - 2.13M were surveyed) there are approximately 1.436M websites.

Webtype Websites Web % Total %
A 373612 26.0223 21.0975
B 82188 5.7244 4.6411
D 46450 3.2353 2.6230
F 96342 6.7103 5.4403
H 310639 21.6362 17.5414
N 3088 0.2151 0.1744
P 106361 7.4081 6.0061
R 275886 19.2156 15.5790
S 8331 0.5803 0.4704
U 4565 0.3180 0.2578
W 126224 8.7916 7.1277
X 2053 0.1430 0.1159

A: Active/not yet classified.
B: Brand protection registration.
D: refresh in webpage.
F: Forbidden or other 4nn code.
H: Holding page with no content.
N: Duplicate content network of sites.
P: PPC parked.
R: Redirected (301/302 codes).
S: Site is for sale or rent.
U: Site unavailable (127.0.0.1 is not a valid IP etc).
W: Domain aggregation network sites.
X: Porn sites.The classification process is still underway and the actively developed websites figure is continually being downgraded as “coming soon” and parking sites. It would not be unthinkable to see a figure closer to 10% for the number of active .eu websites.The usage of .eu is a disaster. However it may have some attractions for businesses that operate on a Europe wide basis. But as a domain for Europe, it is irrelevant.

The classifcation process is based on search engine index building methods. The response codes are only the start of the process. The process itself involves analysing the titles, keywords and descriptions for each site and comparing the html.

In any case, this may provide the basis for a .eu webdirectory or search engine. But are people really interested in .eu ccTLD?
Tags: ,, , , , , ,

Written by John McCormac on July 10th, 2007 with 4 comments.
Read more articles on Search Engines and Domains And Statistics.

SEO Claims About Irish Websites

Some of the recent press releases from various companies trying to flog SEO services to large companies would be comical if they weren’t so tragic. Real search engine optimisation involves a lot more than merely looking at what tags and metadata are present in a webpage. One new company trying to flog SEO to the Irish market even sent out its press release before it had its website operational. Another did a survey of what it claimed was the top 100 Irish company websites. Now how would it know? There are at least 200K Irish domains and many of them would get more traffic than some of what are supposed to be the top Irish companies.

WhoisIreland will publish a survey of identified Irish websites dealing their title / keyword / description and other metadata. Spidering the websites is the easy part. This is something that WhoisIreland does anyway to provide the stats on the front page. The hard part is developing an accurate parser because sometimes HTML is not written in a text book format.

A preliminary .ie section is already done. The interesting thing is that there are parked .ie domains but the number is dwarved by the number of active .ie websites. Some interesting patterns are emerging. Some sites use Google Adsense for monetisation but Yahoo’s Publisher Network hardly even registers. There are some 12 .ie domains that are parked on Sedo. Compared to .eu and many other ccTLDs, the .ie ccTLD is more utilised and healthier.

Tags: , , , , , ,

Written by John McCormac on March 27th, 2007 with 5 comments.
Read more articles on Search Engines and Tech Commentary.

Google Blogsearch Gets Pingable - Trouble For Technorati?

A post on Tom Raftery’s blog points to a very interesting development in the Blogosphere. Google has announced that bloggers can now ping it directly for inclusion in its blogsearch index. This move is interesting because it means that Google is taking blogging seriously. It also could spell trouble for Technorati and other blog search operations.

This blog seems to be having problems with Technorati. It fails to update information about it despite being pinged. And e-mails to its support desk go unanswered. Now that Google is in the game, the writing may be on the wall for blog aggregators and blog search sites. And having blogs integrated into Google provides a one stop shop for searchers - it has the audience that these other blog search sites do not.

Written by John McCormac on October 10th, 2006 with 2 comments.
Read more articles on Search Engines.

Building A .eu Search Engine?

Is it worth spending time building a .eu search engine? The launch of .eu has been a disaster because an incompetent registry couldn’t deal with the whole TLD being subverted by some clever business people. But even so there must be a few thousand new sites in .eu that are not PPC mortgage, creditcard or ringtone linkswamps.

Most .eu websites, so far, are either “coming soon” pages, PPC network pages or parking pages. There is very little evidence of major sites using their .eu domain as a primary identity. Most are just pointing their .eu domain at their main .com or .ccTLD website. Ironically this narrows down the problem to finding that rarest of things - a genuine .eu website.

They exist but they are mainly blogs. The newness of the .eu ccTLD has provided some opportunity for bloggers to build new .eu based blogs using domains that were not available in .com gTLD. But blogs do not make a TLD and it may well be years before there are any major, unique, sites in .eu ccTLD. This does mean that if the dross can be removed, a small and fast search engine covering .eu is feasible.

Tags: , , , , , , ,

Written by John McCormac on September 24th, 2006 with 16 comments.
Read more articles on Search Engines.

CTO Resigns And 2 Fired In AOL Search Data Fallout

The fallout over the AOL search data leak continues. The CTO of AOL, Maureen McGovern will leave AOL immediately. Two other employees, the researcher and his supervisor have been fired. McGovern oversaw the division that released the user search data. The actions were announced in two e-mails from Jon Miller the chairman and Chief Executive Officer of AOL.

The amount of data released by AOL is staggering but it is the idea that it has been anonymized is the problem. Dataminers detect patterns in piles of data that appears to be just a chaotic collection. And naturally some people have been identified from their search patterns.

Tag: - - -

Written by John McCormac on August 22nd, 2006 with 2 comments.
Read more articles on Search Engines.

AOL Releases User Search Queries

AOL has released the search queries of over 650K of its users over a three month period. The file contains over 20 million searches. Though the users’ identities are anonymised, there could well be patterns of searching that can be determined.

It is somewhat ironic that the Irish Times had an OpEd waffling on about data privacy and state surveillance. The reality is that these e-jits in the Irish media missed the boat. The big threat to privacy is the data that commercial entities such as the search engines compile about users.

AOL Research has pulled the data but Greg Sadetsky has posted a set of mirror sites.

Tag: - - -

Written by John McCormac on August 7th, 2006 with no comments.
Read more articles on Search Engines.

Searchtheowl Gets Size Of Irish Web Wrong - Again!

Searchtheowl , the search engine that reinvented itself as a directory seems to have gotten more facts wrong. A post on the Searchtheowl blog really proved that Mike Russen still does not have a clue about the size of the Irish web.

The truth is is in the world scale Ireland comes around 49th in the world for the rankings for country domain names .ie 22,000.

As of today, there are 63118 .ie domains. So basically, Mike Russen thinks that there are almost three times less .ie domains than there really are. The last time the .ie ccTLD was at 22000 domains was in 2001. It is not difficult to find these domain statistics.

Russen gives the following reason for converting searchtheowl to a directory. It is somewhat offensive but it was probably meant in jest:

When you think that there are 4.5 million alcholics [sic] clinging to rock in the atlantic ocean, the question became “was it worth trying to help local business” - the answer a resounding NO! It became quite boring trying to explain things so gave up. Thats why we stopped searching Ireland.

I think that Russen found out the hard way that user submissions only work when there is sufficient traffic on a web directory. Beyond that, the process has to either involve a lot of people (like Dmoz) or be automated (as in the way that spiders crawl the web). And for a country level search engine the problems are compounded. With millions of new websites appearing every month, the country level search engine is faced with detecting the handful that might be relevant.

One of the most important aspects of country level search engine design is to have an idea of the size of the webspace that you intend to search. Otherwise you could end up like Searchtheowl - trying to spider what could be upwards of a 15 million page webspace with a php script on some budget shared hosting.

Tag: - -

Written by John McCormac on July 7th, 2006 with 7 comments.
Read more articles on Search Engines.

« Older articles

No newer articles