June 2005

You are currently browsing the articles from WhoisIreland Review written in the month of June 2005.

Why HTML Scrambling Is Not Secure Encryption

Scrambling the HTML of a webpage with Javascript is not unbreakable encryption and it could be a great way to get a site kicked out of the search engines. Indeed to a search engine operator, a webpage with only meta data and no page text is typically that of SPAM.

Some websites use Javascript based HTML scrambling to protect the source code. Others use it to prevent the saving or printing of the webpages. But this obfuscation is sometimes sold to gullible website owners as unbreakable encryption. The reality is that it is very simple to break - well it would be. The algorithm to decode it is included in the webpage.

This particular HTML scrambling scheme relies on the browser to decode the “encrypted” HTML source and display it in the browser. The algorithm itself is typically included in a fragment of escaped Javascript. It often looks like this:

eval(unescape('%6B%3D%75%6E%65%73%63%61

Basically the Javascript is unescaped, interpreted and run to unscramble the HTML source. The unscrambled webpage is displayed in the browser. The algorithm from one example is below:

function und1(s){var un="";
// 'un' is the unscrambled HTML

l=s.length;
// l is the length of the scrambled HTML block in characters
oh=Math.round(l/2);
// oh is half of l
for(i=0;i< =oh;i++){a=s.charAt(i); b=s.charAt(i+oh); c=a+b;
un=un+c;};
// the loop. Take the character at i and the character at
// i + oh and put them on to the end of the 'un' string

X=un.substr(0,l);
};

The scrambled HTML is not that difficult to read. The first character is read, then the character at half the length of the scrambled HTML block is read. And the scrambled HTML is decoded two characters at a time. To a cryppie, this kind of scrambled text looks different to text enrypted with a hard algorithm. It still has the characteristics of natural language - something that ciphertext does not have.

From a cryptographic viewpoint, a Javascript scrambled webpage offers only the most elementary protection. It may stop casual printing and saving of webpages but that is it. The model itself is flawed because the unscrambled HTML has to be displayed in the browser and therefore the algorithm to unscramble the HTML has to be included.

So why do people use it? Some people want to protect their HTML. Others want to protect the links in their pages from poaching. Some sites have rather dubious links that they want to keep away from the attention of search engines. By using this kind of HTML obfuscation, they think that it evades search engines and content filtering.

However the downside is that search engine operators are aware of this kind of cloaking and so are content filter programmers. It would be an easy win for search engines to drop such sites from their indices and some content filters now apparently block websites with obfuscated HTML.

Tags: - - -

Written by John McCormac on June 30th, 2005 with comments disabled.
Read more articles on Search Engines.

Searching For A Clue?

A few days ago, the contact e-mail for the WhoisIreland.com spider got an e-mail to say that the site had been included in Ireland’s only dedicated search engine. Considering that WhoisIreland.com is the web’s biggest Irish search engine, it was a rather surreal thing. The fact that there are at least one other Irish search engine and a pile of other Irish web directories made it all the stranger. That and the fact that the e-mail began “A Chara”. This is the way that all the Irish government agencies used to start letters like tax demands.

Some of us Irish search engine and directory operators invest thousands of Euros in dedicated servers and research and building sites. But the Irish search and directory business is not exactly the business for the clueless. It is a tough battleground where only the best survive against the behemoths like Google and Yahoo.

Fergal O’Byrne’s OMNI SEO blog posted an interview with the operator of the site. It didn’t seem to be quite on the level of a real interview. Sure the buzz words and the marketing speak were all in place but the cornerstone of the business was missing. It seems that everyone sees search engine results and thinks that building and maintaining search engines is as easy as sticking a few URLs in an off the shelf php script on a shared hosting account and calling it a search engine.

The comments on Fergal’s blog were interesting in that some others contacted regarded the e-mails as spam. Though I’m still trying to figure out why someone would e-mail a search engine’s contact e-mail to tell it that it was included in a search engine. Such are the perils of being a search engine operator. I wonder how Google deals with it?

Tag: - -

Written by John McCormac on June 17th, 2005 with 3 comments.
Read more articles on Search Engines.