| |
| Home | CV | Publications | Research Papers |
Events | Useful Links |
| |
|
Section six - XML Online Information Review - Apr 00 - The evolution of web searching Since XML was completed by the World Wide Web Consortium (W3C - the body responsible for developing technical standards for the web) in early 1998 - it has attracted an almost hysterically evangelical response. So just what is it, and what are its implications for web searching? Most web pages are currently produced in HyperText
Mark-up Language (HTML). Whilst HTML's ease of use fuelled its widespread
adoption, it is somewhat limited in that it is primarily concerned with
the design/layout of a webpage, rather than the information that actually
appears on that page. Considering that a primary use of the web is for
information retrieval this design focus is something of a drawback. XML is an open technology that offers tremendous possibilities for electronic publishing, ecommerce, information retrieval and data exchange. It consists of rules that enable anyone to create their own mark-up language. XML describes information using pairs of tags that are nested inside one another to multiple levels. (13) These create a tree structure of nested hierarchies. This convention allows users direct access to just the particular segment of the information that they are interested in e.g. hyperlinks can go through to the relevant section of a document rather than the entire document. It also enables powerful structured searching akin to database field searching, but on textual web pages. In other words, XML not only enables explicit description of webpage content, but also describes the rules for manipulating each data set contained within the information. This enables a small program such as a java script to process the information on the user's local hard drive according to their requirements, rather than the user requesting a new web page from the central server. Multiply by millions of web users, and this capability will dramatically decrease the demands on web servers and improve network traffic. (14) Based on open standards, XML will allow data exchange between different computer systems regardless of operating system or hardware. As XML is also based on Unicode, a character encoding system that supports the intermingling of text in all of the world's major languages, it will also allow the exchange of information across national and cultural boundaries. (13) Using various XML style sheets (XSL) publishers will also be able to automatically re-purpose their content for various devices. There are even stylesheets that will read the text of the webpage aloud, which is of great benefit to the visually impaired. However, whilst XML will deliver great benefits for searching, publishing and exchanging information, these benefits will not be realised without some effort:
To facilitate the transition to XML, the W3C
released a hybrid of HTML 4.0 and XML - XHTML 1.0, in August 1999 for
review. It is highly unlikely that there will ever be an HTML 5.0. Earlier
in April, IBM launched the Internet's first search engine that is exclusively
focused on XML data, called xCentral. This search engine is available
from IBM's XML website.
Related article: XML and Information Management, published in Information World Review, Dec 01 |
| Back to the top |