 |
|
ab |
|
|
|
|
The
Invisible Web: Where Search Engines Fear to Go
| |
 |
|
Searching
the Web for business information has never been easier given multitude of
directories, search engines and portals. Unknown to the casual Web surfer,
however, there is a second and vital part of the Web, the "invisible Web.
By
Ian Smith
Competia
This article is used with expressed permission from Competia Online Magazine
| |
 |
|
Searching the Web for
business information has never been easier given multitude
of directories, search engines and portals that are
offered online. A collection of these tools should make up
an individual's "search toolbox" when a search
becomes unsuccessful using the well-known Internet search
services. Using a personal "search toolbox" may
result in a successful retrieval of information; however,
there may be more information available online that your
choice of tools is not able to access.
The popular search tools (Yahoo, Hotbot, AltaVista, etc.)
are designed to retrieve information from the "visible
Web." Unknown to the casual Web surfer, there is a
second and vital part of the Web, the "invisible
Web," where information that you may have been
looking for hours may be stored.
The
Visible Web Vs. the Invisible Web
As the term suggests, the visible Web is the part of the
Web that is recognized by the numerous search tools. Search
engines recognize content on the Web by indexing
documents whose content is stored. According to various
search engine experts, the top search tools fail to index
70%-75% of the pages on the Web. As the Web grows
exponentially (its total size now estimated to be
approaching a trillion individual pages or
documents), the percentage will decrease based on the
decision search engines managers to cease to index the
millions of documents submitted for inclusion in their
database. As a result, users will not be able to obtain current
information that is expected to be listed on the top
engines. To satisfy information needs on the Web, the
invisible Web offers the best outlet for searching and
retrieving content.
The invisible Web is a wealthy source of online documents
that are not suited to be part of the Web search tools.
These documents are:
Non-indexed
Pages
As mentioned before, in order
for a search engine to retrieve a document on the Web, the
document must be indexed. To be indexed, authors of the
document must include coding and links that various search
tools' spiders must be able to follow. Documents with text,
graphics, CGI scripts, Macromedia flash or PDF files are
not able to be indexed. Therefore, they slip to the
invisible Web where more experience and skilled Web
searchers can find them. Presently, search engines are being
created specifically to index Macromedia flash or PDF files;
however, these search engines have not been perfected to
retrieve accurate information for their users.
Databases
The majority of the content of the invisible Web is databases.
When an indexing spider comes across a database, the spider
is automatically locked out since there is not a way to link
the content to any of the search engines. In addition, an
increase number of online database designers are making it
difficult for engines to index their respective documents.
On the other part of the equation, indexing spider software
designers are investing and working on a piece of software
to allow search engines to gain access to online databases.
Size
of the "Invisible Web"
According to the definition of the invisible Web, one can
assume that its size is relatively large. Based on a study
by BrightPlanet, a South Dakota company that has developed
new software to plumb the Internet's depths, the Web
contains over 550 billion documents with search
engines indexing only about 1 billion pages. This
sizeable amount of documents with information remains to be uncovered
and packaged in different forms online. The majority
of invisible Web content is stored in the following
databases:
- Medical
databases
- Discussion
lists
- Patent
databases
- Phone
numbers, e-mails, addresses, etc.
- Government
databases
- Scientific
databases
- Auction
databases
- Legal
databases
- Dictionaries,
thesauri, etc.
- Knowledge
databases
- Product
catalogs
Keep in mind that some of these databases are password
protected and obtaining access to certain ones (i.e.,
legal and patent databases) could involve some work.
Tools
for the Invisible Web
It would be incorrect to
assume that there are no searching tools available to hunt
for information on the invisible Web. With the explosion of
content on the invisible Web, Web-search software technology
firms and experienced librarians have supplied information
specialists with tools that will do an ideal job of
retrieving information from various databases. Figure 1 is a
summary of the search tools to scour the invisible Web
efficiently.
Figure 1 : Search Tools for the Invisible Web
Tool
|
Description
|
| |
|
|
CompletePlanet
|
Provides
"the most complete listing available of
'surface' Web-search engines and 'deep' Web
searchable databases." Most effective use is
to search, not browse, listed resources.
|
|
Direct
Search
|
Provides
access to the search interfaces of Invisible Web
resources that are not easily searchable from the
major search engines.
|
|
Fossick.Com
|
Contains
over 3,000 specialty databases and search engines
across most academic disciplines and popular
topics.
|
|
Infomine
|
An
"academic" search engine, focusing on
scholarly resource collections, electronic
journals & books, online library card
catalogs, and directories of researchers.
|
|
Internet
Oracle
|
Search
forms and direct links to hundreds of search
engines, from general purpose directories to niche
topic indexes.
|
|
InvisibleWeb.com
|
A
directory of more than 10,000 specialized
databases available on the Web.
|
|
Lycos
Directory of Searchable Databases
|
Lists
of reference databases in scholarly and popular
topics.
|
|
The
BigHub
|
An
index of over 3,000 subject specific searchable
databases in over 300 categories.
|
|
Webdata.com
|
A
database portal, specializing in finding,
categorizing and organizing online databases, and
providing annotated links with quality rankings.
|
Conclusion
As the Web grows, content providers will find themselves
with the difficult task of disseminating information.
Presently, users of the Internet depend on the popular
search engines and portals for their outlet for information.
Unfortunately, these search tools do not have the
capabilities to gain access to databases which hold valuable
data that may influence business decisions. To have access
to these databases, one must become familiar with the
structure of the invisible Web. The invisible Web holds the
majority of the documents that are on the Internet and are
rarely retrieved. The search tools which are offered do an
adequate job fleshing out information; however, these tools
need to be improved on to have decision makers utilize the
invisible Web for their business information needs.
Do you want to read more about the invisible Web, here's a
collection of articles previously published.
|
| ab |
|
 |