Success Stories | Tools & Solutions | Home Business Blog | Consult Your Guide | SmallBiz News | Advertise | Resources | Videos

       

Home | About Us Contact Us | Site Map | Search

 

 

Starting a Business
Working at Home
Financing a Business
Growing a Business
Managing a Business
Marketing/Promotions
Ecommerce/Internet
Online Marketing
Business Ideas
Leadership/Mgt.

Related Articles


Monitoring Web Pages and Searching: Which are the Best Tools?
The Invisible Web: Where Search Engines Fear to Go
Sharing Information in Your Company: Intranet
Mum's the Word: Effective Means of Protecting Your Password
LinkDragon: Share Bookmarks Easily

Recommended Books


You Can Afford to Stay Home With Your Kids
So You Want to Be a Stay-At-Home Mom
Staying Home: From Full-Time Professional to Full-Time Parent
Stay-at-home Parent Survival Guide: Real-Life Advice from Moms, Dads, and Other Experts A to Z
All Mothers Are Working Mothers : A Devotional for Stay-At-Home Moms-And Those Who Would Like to Be

 Tools and Solutions

Office Services
Tech Services
Marketing and Sales
Personal Development 

ab
 
The Invisible Web: Where Search Engines Fear to Go
Searching the Web for business information has never been easier given multitude of directories, search engines and portals. Unknown to the casual Web surfer, however, there is a second and vital part of the Web, the "invisible Web.

By Ian Smith
Competia

This article is used with expressed permission from Competia Online Magazine

 

Searching the Web for business information has never been easier given multitude of directories, search engines and portals that are offered online. A collection of these tools should make up an individual's "search toolbox" when a search becomes unsuccessful using the well-known Internet search services. Using a personal "search toolbox" may result in a successful retrieval of information; however, there may be more information available online that your choice of tools is not able to access.

The popular search tools (Yahoo, Hotbot, AltaVista, etc.) are designed to retrieve information from the "visible Web." Unknown to the casual Web surfer, there is a second and vital part of the Web, the "invisible Web," where information that you may have been looking for hours may be stored.

The Visible Web Vs. the Invisible Web

As the term suggests, the visible Web is the part of the Web that is recognized by the numerous search tools. Search engines recognize content on the Web by indexing documents whose content is stored. According to various search engine experts, the top search tools fail to index 70%-75% of the pages on the Web. As the Web grows exponentially (its total size now estimated to be approaching a trillion individual pages or documents), the percentage will decrease based on the decision search engines managers to cease to index the millions of documents submitted for inclusion in their database. As a result, users will not be able to obtain current information that is expected to be listed on the top engines. To satisfy information needs on the Web, the invisible Web offers the best outlet for searching and retrieving content.

The invisible Web is a wealthy source of online documents that are not suited to be part of the Web search tools. These documents are:


Non-indexed Pages

As mentioned before, in order for a search engine to retrieve a document on the Web, the document must be indexed. To be indexed, authors of the document must include coding and links that various search tools' spiders must be able to follow. Documents with text, graphics, CGI scripts, Macromedia flash or PDF files are not able to be indexed. Therefore, they slip to the invisible Web where more experience and skilled Web searchers can find them. Presently, search engines are being created specifically to index Macromedia flash or PDF files; however, these search engines have not been perfected to retrieve accurate information for their users.

Databases

The majority of the content of the invisible Web is databases. When an indexing spider comes across a database, the spider is automatically locked out since there is not a way to link the content to any of the search engines. In addition, an increase number of online database designers are making it difficult for engines to index their respective documents. On the other part of the equation, indexing spider software designers are investing and working on a piece of software to allow search engines to gain access to online databases.

Size of the "Invisible Web"

According to the definition of the invisible Web, one can assume that its size is relatively large. Based on a study by BrightPlanet, a South Dakota company that has developed new software to plumb the Internet's depths, the Web contains over 550 billion documents with search engines indexing only about 1 billion pages. This sizeable amount of documents with information remains to be uncovered and packaged in different forms online. The majority of invisible Web content is stored in the following databases:

  • Medical databases
  • Discussion lists
  • Patent databases
  • Phone numbers, e-mails, addresses, etc.
  • Government databases
  • Scientific databases
  • Auction databases
  • Legal databases
  • Dictionaries, thesauri, etc.
  • Knowledge databases
  • Product catalogs

Keep in mind that some of these databases are password protected and obtaining access to certain ones (i.e., legal and patent databases) could involve some work.

Tools for the Invisible Web

It would be incorrect to assume that there are no searching tools available to hunt for information on the invisible Web. With the explosion of content on the invisible Web, Web-search software technology firms and experienced librarians have supplied information specialists with tools that will do an ideal job of retrieving information from various databases. Figure 1 is a summary of the search tools to scour the invisible Web efficiently.

Figure 1 : Search Tools for the Invisible Web

Tool

Description

   

CompletePlanet

Provides "the most complete listing available of 'surface' Web-search engines and 'deep' Web searchable databases." Most effective use is to search, not browse, listed resources.

Direct Search

Provides access to the search interfaces of Invisible Web resources that are not easily searchable from the major search engines.

Fossick.Com

Contains over 3,000 specialty databases and search engines across most academic disciplines and popular topics.

Infomine

An "academic" search engine, focusing on scholarly resource collections, electronic journals & books, online library card catalogs, and directories of researchers.

Internet Oracle

Search forms and direct links to hundreds of search engines, from general purpose directories to niche topic indexes.

InvisibleWeb.com

A directory of more than 10,000 specialized databases available on the Web.

Lycos Directory of Searchable Databases

Lists of reference databases in scholarly and popular topics.

The BigHub

An index of over 3,000 subject specific searchable databases in over 300 categories.

Webdata.com

A database portal, specializing in finding, categorizing and organizing online databases, and providing annotated links with quality rankings.

 

Conclusion

As the Web grows, content providers will find themselves with the difficult task of disseminating information. Presently, users of the Internet depend on the popular search engines and portals for their outlet for information. Unfortunately, these search tools do not have the capabilities to gain access to databases which hold valuable data that may influence business decisions. To have access to these databases, one must become familiar with the structure of the invisible Web. The invisible Web holds the majority of the documents that are on the Internet and are rarely retrieved. The search tools which are offered do an adequate job fleshing out information; however, these tools need to be improved on to have decision makers utilize the invisible Web for their business information needs.

Do you want to read more about the invisible Web, here's a collection of articles previously published.

 

ab