| |
| Home | CV | Publications | Research Papers |
Events | Useful Links |
| |
|
Section one - Defining the web Online Information Review - Apr 00 - The evolution of web searching In that constellation of computers known as the Internet, transmitted data is split into small 'packets' which is an exponentially more efficient utilisation of bandwidth. This, together with easier-to-use technologies, has collapsed the costs of electronic publishing, resulting in the estimated daily deluge of over one million webpages of information currently being published onto the web (1). However, despite its uniform interface and seamless linked integration, the web is not a single coherent element. There are two distinct elements of the web; the 'visible' web and the 'invisible' web. In order to understand the implications of this distinction for information retrieval, it is necessary to first digress into a consideration of how webpages are produced. Essentially, there are two types of webpage; static and dynamic. Static web pages have been manually created by a web designer, posted onto a web server and are available to anyone or anything that visits the website of which it is a part. Any changes must be made manually. Dynamic web pages are created by a computer
using a script (often CGI, Java or Perl). This script acts as an intermediary
between the user requesting, or submitting, information on a static
web page (the 'front-end') and a database (the 'back-end'), which supplies,
or processes, the information. The script slots the results into a blank
web page template and presents the visitor with a dynamically generated
webpage (2). The diagram below illustrates this process:
Fig. 1 Dynamic web page generation Static webpages provide the same generic information to everyone, whilst dynamically generated webpages provide unique information, customised to the user's specific requirements. Available for view to everyone, and for indexing to all search engines, static web pages together constitute the 'visible' web. This is the element of the web that researchers at the NEC Research Institute in Princetown USA, refer to as 'publicly indexable world-wide web' in their study into 'Accessibility of information on the web' (3). The 'invisible web' refers to web pages with
authorisation requirements, pages excluded from indexing using the robots
exclusion meta tag and information that resides within databases that
will only ever be temporarily present on the web as dynamically generated
webpages.
Table 1. Comparison of static and dynamically generated webpages The first NEC study (4) estimated that the 'visible' web contained at least 320M web pages in December 1997, whilst the second study (3) estimated the 'visible' web had blossomed into a burgeoning 800M web pages, representing six terabytes of text data, as of February 1999. Due to its massively more disparate structure and range of data types, there has been, as yet, no scientific research conducted to determine the size of the 'invisible' web. However, most publishers distribute their data on the web by integrating huge databases, often gigabytes in size, with a front-end search interface. By virtue of its commercial professionally published origin, such information is typically high value and more highly structured and indexed than the 'visible' web. The user's search enquiry will generate customised, as opposed to generic, results. Therefore, for professional researchers, it can be said that information is increasingly accessed via the web, rather than on it. Nonetheless, the 'visible' web constitutes a significant contribution to the dissemination of human knowledge, and as the NEC studies acknowledged 'much of (this) material is not available in traditional databases'. It is no surprise that several surveys such as Nielsen Netratings or Media Matrix consistently show that search engines are amongst the most popular destination sites on the web. Next: Section two - Search engines and web directories explained |
| Back to the top |