|
Online Information Review
- Apr 00 - The evolution of web searching
Web directories explained
What is the difference between a web directory and a search engine?
A web directory is:
-
a pre-defined list of websites
-
compiled by human editors
-
categorised according to subject/topic
-
selective
Because humans compile web directories, a qualitative
decision concerning the content on each listed website has already been
made. Consequently web directories are popular with Internet users looking
for particular information because they feel that they have a head start
in identifying 'the best of the web' for the topic that they're interested
in.
In using a web directory the user can navigate
through the listings or search across the entire directory. The major
web directories also license search engine indexes to provide secondary
results for whenever their human-compiled directory fails to produce
matching results to the user's query. For example, the world's largest
web directory, Yahoo!, licences the Inktomi search index for just this
purpose.
As a result of the manual compilation process,
web sites that have been indexed by web directories will remain listed
within that directory, unless, in the highly unlikely event, they are
manually removed. This 'permanence of presence' is not guaranteed with
a listing within a search engine index, thus making a listing within
a popular web directory such as Yahoo! highly desirable.
Broadly speaking, any web site that is comprised
of several pages of organised links can be considered a web directory.
Many individuals, whether experts in their field or those passionate
about a particular subject, have compiled such sites. One such voluntary
web directory which has exploded to global status, becoming a real rival
to world-leader Yahoo!, is the Open Directory. Business
Researcher's Interests is a web directory of specific relevance
to information professionals.
Search engines explained
When using a search engine, the user is searching a database of
indexed websites. All search engines have three primary components:
-
a 'spider' that examines web sites
-
an index/database of web site listings
-
interrogation /retrieval software
Search engine spiders
Search engine databases are primarily built up by 'spiders'. Dispatched
on an automatic and frequent basis by search engines, spiders are programmes
that search the web for new web pages, index words and/or links on those
pages, and match the indexed words with the URL of the page on which
they appear.
Search engine index/database
This is the main element of any search engine - it is what the user
interrogates. Once it could be said that these indexes where built along
similar guidelines, with the location and frequency of words the primary
determining factors in results relevance ranking. However, during 1998,
a number of new search engine providers appeared. These companies built
their indexes according to differing criteria. The Direct Hit index
is based on the 'popularity' of a web site, the Google index is based
on the number of links between pages and sites, whilst the Real Names
index is a pay-for service that enables companies to register keywords
to protect their brands and company identity. Each of these approaches
is discussed in further detail below.
Interrogation/retrieval software
All search engines have their own customised software to interrogate
their databases. Essentially though, they operate according to similar
principles: any web site which contains words or terms that match the
user's search query will be presented in the list of results presented
on screen to the user. Ranking each of these matching web sites by relevance
is determined by algorithms that analyse the location and frequency
of the user's search terms against this list of matching web sites.
The nuances of how these algorithms work varies between search engines
- which is one reason for the different results that users usually experience
when running the same search across different search engines. However,
a much more important reason for these search results differences is
that 'the (content) overlap between the engines remains relatively low'.
(3)
Next: Section three - Portalisation:
the emergence of search sites
|