The Deep Web Defined
The Deep Web is simply content available on the Internet that is hidden or undiscovered by search engines such as Google, Live and Yahoo. Typically search engines ‘crawl’ the Internet by following links (URLs) found on Internet pages being indexed. As an example, the page you are currently reading has numerous links to other pages. A search engine will follow the links and index the results of those linked pages.
What happens to the Internet pages without links to them or if they are ‘hidden’ from crawlers by passwords or other authentication (such as commonly used captcha text)? The pages described in the previous sentence constitute the majority of the Deep Web and it is a significant challenge for search engines to access the Deep Web. It is estimated that content/information in the Deep Web far exceeds the quantity of content or data which is currently available to be crawled. Discovering and Indexing a Deep Web page is termed ‘surfacing’.
Towards a Semantic Deep Web
Advances in surfacing Deep Web content will present significant amounts of additional information. A barrier still remains where the relationships between the data remain unknown and undefined in the vast majority of circumstances. The Semantic web will greatly assist in providing advantages and understanding these relationships as well as making the vast volume of data far more useable and search friendly.