David Green BA (Hons), PgDipLIS, MCLIP    
Home CV Publications

Research Papers

Events Useful Links

 

Section two - Search engines and web directories

Online Information Review - Apr 00 - The evolution of web searching

Web directories explained
What is the difference between a web directory and a search engine? A web directory is:

  • a pre-defined list of websites

  • compiled by human editors

  • categorised according to subject/topic

  • selective

Because humans compile web directories, a qualitative decision concerning the content on each listed website has already been made. Consequently web directories are popular with Internet users looking for particular information because they feel that they have a head start in identifying 'the best of the web' for the topic that they're interested in.

In using a web directory the user can navigate through the listings or search across the entire directory. The major web directories also license search engine indexes to provide secondary results for whenever their human-compiled directory fails to produce matching results to the user's query. For example, the world's largest web directory, Yahoo!, licences the Inktomi search index for just this purpose.

As a result of the manual compilation process, web sites that have been indexed by web directories will remain listed within that directory, unless, in the highly unlikely event, they are manually removed. This 'permanence of presence' is not guaranteed with a listing within a search engine index, thus making a listing within a popular web directory such as Yahoo! highly desirable.

Broadly speaking, any web site that is comprised of several pages of organised links can be considered a web directory. Many individuals, whether experts in their field or those passionate about a particular subject, have compiled such sites. One such voluntary web directory which has exploded to global status, becoming a real rival to world-leader Yahoo!, is the Open Directory. Business Researcher's Interests is a web directory of specific relevance to information professionals.


Search engines explained
When using a search engine, the user is searching a database of indexed websites. All search engines have three primary components:

  • a 'spider' that examines web sites

  • an index/database of web site listings

  • interrogation /retrieval software

Search engine spiders
Search engine databases are primarily built up by 'spiders'. Dispatched on an automatic and frequent basis by search engines, spiders are programmes that search the web for new web pages, index words and/or links on those pages, and match the indexed words with the URL of the page on which they appear.

Search engine index/database
This is the main element of any search engine - it is what the user interrogates. Once it could be said that these indexes where built along similar guidelines, with the location and frequency of words the primary determining factors in results relevance ranking. However, during 1998, a number of new search engine providers appeared. These companies built their indexes according to differing criteria. The Direct Hit index is based on the 'popularity' of a web site, the Google index is based on the number of links between pages and sites, whilst the Real Names index is a pay-for service that enables companies to register keywords to protect their brands and company identity. Each of these approaches is discussed in further detail below.

Interrogation/retrieval software
All search engines have their own customised software to interrogate their databases. Essentially though, they operate according to similar principles: any web site which contains words or terms that match the user's search query will be presented in the list of results presented on screen to the user. Ranking each of these matching web sites by relevance is determined by algorithms that analyse the location and frequency of the user's search terms against this list of matching web sites. The nuances of how these algorithms work varies between search engines - which is one reason for the different results that users usually experience when running the same search across different search engines. However, a much more important reason for these search results differences is that 'the (content) overlap between the engines remains relatively low'. (3)

Next: Section three - Portalisation: the emergence of search sites

Back to the top