Google Tries to Tap Into the Hidden Web
The web is vast. Incredibly vast. Some estimates put the searchable web at around 11 billion pages. it would take lifetimes to view all that content. But that is just the tip of the iceberg. Behind the searchable, Google oriented web is massive amounts of content not available to search engines; this is called the hidden, or invisible web. Some estimates put the amount of data that is hidden to search engines at 15 billion + pages; much larger than what most people would normally call the web.Pages can be hidden from the search engines for a number of reasons: the content could be unspiderably dynamic; the content could be unlinked; the content could be limited access; the content could be in an image or video or the content could be only accessed by a form. Since the early days of the web the search engines have wanted to gain access to this uncharted realm of information to enhance their reputation as having the biggest index available to surfers.
Last month Google announced in their Webmaster blog that a technological breakthrough had been made to gain access to hitherto inaccessible web content. In the past few months Google has been experimenting with using their spider (Googlebot) to fill out HTML forms in order to gain access to hidden content and URLs to index for Google users. Google's blog comments:
"Specifically, when we encounter a form element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML."







0 Comments:
Post a Comment
Links to this post:
Create a Link