David Green BA (Hons), PgDipLIS, MICLIP    
Home CV Publications

Research Papers

Events Useful Links

 

Section one - Defining the web

Online Information Review - Apr 00 - The evolution of web searching

In that constellation of computers known as the Internet, transmitted data is split into small 'packets' which is an exponentially more efficient utilisation of bandwidth. This, together with easier-to-use technologies, has collapsed the costs of electronic publishing, resulting in the estimated daily deluge of over one million webpages of information currently being published onto the web (1).

However, despite its uniform interface and seamless linked integration, the web is not a single coherent element. There are two distinct elements of the web; the 'visible' web and the 'invisible' web. In order to understand the implications of this distinction for information retrieval, it is necessary to first digress into a consideration of how webpages are produced.

Essentially, there are two types of webpage; static and dynamic.

Static web pages have been manually created by a web designer, posted onto a web server and are available to anyone or anything that visits the website of which it is a part. Any changes must be made manually.

Dynamic web pages are created by a computer using a script (often CGI, Java or Perl). This script acts as an intermediary between the user requesting, or submitting, information on a static web page (the 'front-end') and a database (the 'back-end'), which supplies, or processes, the information. The script slots the results into a blank web page template and presents the visitor with a dynamically generated webpage (2). The diagram below illustrates this process:

Fig. 1 Dynamic web page generation

Static webpages provide the same generic information to everyone, whilst dynamically generated webpages provide unique information, customised to the user's specific requirements.

Available for view to everyone, and for indexing to all search engines, static web pages together constitute the 'visible' web. This is the element of the web that researchers at the NEC Research Institute in Princetown USA, refer to as 'publicly indexable world-wide web' in their study into 'Accessibility of information on the web' (3).

The 'invisible web' refers to web pages with authorisation requirements, pages excluded from indexing using the robots exclusion meta tag and information that resides within databases that will only ever be temporarily present on the web as dynamically generated webpages.

Static web pages

Dynamic web pages

Manually produced

Computer generated

Generic information

Customised information

Most are indexable

Not indexable

Table 1. Comparison of static and dynamically generated webpages

The first NEC study (4) estimated that the 'visible' web contained at least 320M web pages in December 1997, whilst the second study (3) estimated the 'visible' web had blossomed into a burgeoning 800M web pages, representing six terabytes of text data, as of February 1999. Due to its massively more disparate structure and range of data types, there has been, as yet, no scientific research conducted to determine the size of the 'invisible' web.

However, most publishers distribute their data on the web by integrating huge databases, often gigabytes in size, with a front-end search interface. By virtue of its commercial professionally published origin, such information is typically high value and more highly structured and indexed than the 'visible' web. The user's search enquiry will generate customised, as opposed to generic, results.

Therefore, for professional researchers, it can be said that information is increasingly accessed via the web, rather than on it.

Nonetheless, the 'visible' web constitutes a significant contribution to the dissemination of human knowledge, and as the NEC studies acknowledged 'much of (this) material is not available in traditional databases'. It is no surprise that several surveys such as Nielsen Netratings or Media Matrix consistently show that search engines are amongst the most popular destination sites on the web.

Next: Section two - Search engines and web directories explained

Back to the top