Why HTML Scrambling Is Not Secure Encryption
Scrambling the HTML of a webpage with Javascript is not unbreakable encryption and it could be a great way to get a site kicked out of the search engines. Indeed to a search engine operator, a webpage with only meta data and no page text is typically that of SPAM.
Some websites use Javascript based HTML scrambling to protect the source code. Others use it to prevent the saving or printing of the webpages. But this obfuscation is sometimes sold to gullible website owners as unbreakable encryption. The reality is that it is very simple to break - well it would be. The algorithm to decode it is included in the webpage.
This particular HTML scrambling scheme relies on the browser to decode the “encrypted” HTML source and display it in the browser. The algorithm itself is typically included in a fragment of escaped Javascript. It often looks like this:
eval(unescape('%6B%3D%75%6E%65%73%63%61
Basically the Javascript is unescaped, interpreted and run to unscramble the HTML source. The unscrambled webpage is displayed in the browser. The algorithm from one example is below:
function und1(s){var un="";
// 'un' is the unscrambled HTML
l=s.length;
// l is the length of the scrambled HTML block in characters
oh=Math.round(l/2);
// oh is half of l
for(i=0;i< =oh;i++){a=s.charAt(i); b=s.charAt(i+oh); c=a+b;
un=un+c;};
// the loop. Take the character at i and the character at
// i + oh and put them on to the end of the 'un' string
X=un.substr(0,l);
};
The scrambled HTML is not that difficult to read. The first character is read, then the character at half the length of the scrambled HTML block is read. And the scrambled HTML is decoded two characters at a time. To a cryppie, this kind of scrambled text looks different to text enrypted with a hard algorithm. It still has the characteristics of natural language - something that ciphertext does not have.
From a cryptographic viewpoint, a Javascript scrambled webpage offers only the most elementary protection. It may stop casual printing and saving of webpages but that is it. The model itself is flawed because the unscrambled HTML has to be displayed in the browser and therefore the algorithm to unscramble the HTML has to be included.
So why do people use it? Some people want to protect their HTML. Others want to protect the links in their pages from poaching. Some sites have rather dubious links that they want to keep away from the attention of search engines. By using this kind of HTML obfuscation, they think that it evades search engines and content filtering.
However the downside is that search engine operators are aware of this kind of cloaking and so are content filter programmers. It would be an easy win for search engines to drop such sites from their indices and some content filters now apparently block websites with obfuscated HTML.
Tags: Irishblogs - Search - Irish Search Engines - Cloaking
Written by John McCormac on June 30th, 2005 with
comments disabled.
Read more articles on Search Engines.
- [+] Digg: Feature this article
- [+] Del.icio.us: Bookmark this article
- [+] Furl: Bookmark this article