David Green BA (Hons), PgDipLIS, MCLIP    
Home CV

Publications

Research Papers Events Useful Links

The semantic web

Information World Review - Dec 02

My tag cloud - semantic web links

Introduction | Future of search | XML | RDF | Ontologies | Intelligent agents | The global brain | Video

When the web starts thinking for itself
Through enabling easy, widespread publishing, the web has had massive social consequences - dramatically altering human behaviours and expectations in information retrieval, knowledge sharing and collaborative working. However, the web as it currently exists, makes effective searching and data exchange difficult. In September 1998 Tim Berners Lee, the creator of the web, outlined a vision of how the web could evolve to address this fundamental flaw. His 'Semantic Web Road Map' on the W3 Consortium website (the non-profit body co-ordinating web standards development globally) has today helped spawn research activities all over the world to build standards and infrastructure to deliver a vision of a web where low-level information retrieval and processing can be automated.

Meaningful machine manipulation
The semantic web is an extension of the current web, in which data is given well-defined meaning through the use of a series of enabling technologies that are explored in detail below. Documents will be published with 'semantic markup' - that is markup not interpreted for display (as with HTML) but as an expression of the document's content. This fundamental shift in web publishing will have far-reaching repercussions for web search engines. Rather than visit a search engine and trawl through a flat listing of possible matches, users will be able to issue high-level information requests via a web enabled device and have a distilled answer delivered to them. The semantic web is intended to complement humans in areas they do not perform well, such as processing large volumes of information quickly or analysing large texts for certain pieces of information. It will also extend to the 'real' world where appliances will advertise their functionality through smart chips and tags e.g. mobile phones describe their parameters so web content can be customised on the fly.

Search today

Search tomorrow

Search for term

Search for concept

Historical Indexes

Real time environment

Manual process - explicit instructions via search engine site

Automated process - state high-level goal via PC or other device

Machine display: HTML=content presentation

Machine processing: XML=content meaning

Results - as published on each page

Results - distilled from many sources

Flat listing - relationships between data sets not presented

Visualisation of concept space - data relationships presented

Defined data types - e.g. HTML and PDF

Many data/file types

Constellation of computers

Anything anywhere - distributed, peer-to-peer


The semantic web is based on established technologies such as XML, RDF, Ontologies and Intelligent Agents.

XML defined
XML is the successor to HTML. It is a semantically focused open technology that allows far greater possibilities than mere metadata - it allows a publisher to address the meaning of their content. XML enables powerful structured query searching on text web pages, allowing the user direct access to relevant segment of information within document. Through related formatting standards such as XSL, it allows manipulation of how data should be formatted and output. This takes the web page beyond a flat display of data and allows the user to manipulate the data - write data once and publish to any device 'on the fly'. Although publishers create their own arbitrary XML tag structure, XML schema explain the publisher's structure by defining shared mark-up vocabularies and providing hooks to associate semantics with them. Every major player in the technology industry is touting this XML-driven interoperable future. Indeed, in the database arena, XML has already become the standard approach for distributing data from one application to another. back to the top

Another technology, the Resource Description Framework (RDF) provides meaning to structure of XML documents. Just as in human language where meaning is expressed in a sentence composed of subject, verb and object, RDF helps express meaning and relationships between different web pages and concepts through a programming structure of things, properties and values. For example, David Green (thing) is the author of (property) this and other IWR articles (value). Subject, object and verb (or thing, property and value) are encoded in the document through a Uniform Resource Identifier (URI) which ensures that the words on the document are linked to a unique definition that everyone can access on the web. This enables much greater data interchange between systems.

However, whilst RDF allows a publisher to inform a visiting computer which terms it has used to tag the content in a document, different publishers will use different terms/identifiers to express the same concept. Ontologies provide a deeper level of meaning by providing equivalence relations between terms (i.e. term A on my web page is expressing the same concept as term B on your web page). An ontology is a file that formally defines relations among terms e.g. a taxonomy and set of inference rules. By providing such 'dictionaries of meaning' (in philosophy ontology means 'nature of existence') ontologies can improve the accuracy of web searches by allowing a search program to seek out pages that refer to a specific concept rather than just a particular term. back to the top

Whilst XML, RDF and Ontologies provide the basic infrastructure of the semantic web, it is intelligent agents that will realise its power. An intelligent agent can be best described as adaptive computer coding capable of reasoning and that learns from our behaviours and preferences (thus delivering 'proactive personalisation'). There are many thousands of different agents (or bots as they are also known as), each performing specific, specialised tasks (search bots, chatter bots, shopping bots etc). An important aspect of agents is that they are sociable - they can interact and communicate with humans and other agents. In the semantic web, different agents work together to create an 'information value chain' in which the user's search request is 'packet processed' through sub assemblies of information passed between agents - each adding value to construct the user's answer.

A user will issue a high-level information request. An intelligent agent will then analyse this request and delegate it to other appropriate agents and services that it has each identified through service directory ads on the web. These agents will distil large amounts of data distributed across the web and progressively reduce it to a small amount of high-value customised information - in other words the answer! When broken down into a series of explicit search statements, and appropriate content sources to search, a simple user information request is revealed to be a complex task. Automating such tasks will result in an ever-larger role for Artificial Intelligence technologies such as agents.

One key concern is the autonomy vs accountability of intelligent agents. How much information about our behaviours and content preferences do they digress to other agents, databases and systems? There is a need to construct boundaries such as user-determined privacy settings to safely contain such interactions. Similarly agents will need to authenticate the veracity of content sources and other agents they meet through the use of digital signatures - this is of particular concern when much future crime will involve the more profitable theft of personal details rather than artefacts. Recognising this, the Joint Research Centre of the European Commission is building an experimental privacy protection agent using semantic web technology and W3 Consortium's P3P privacy protocol. The agent will automate the process of protecting a user's privacy partially by comparing privacy policies and user's privacy preferences. back to the top.

The global brain
The semantic web and other developments such as grid computing (where any one computer can tap the power of all computers) and an Internet operating system, have given rise to the concept of a 'global brain'. Populated with adaptive, reasoning agents the web will act as a global super-organism - the brain of society. It will open up humanity's collective knowledge to meaningful analysis by agents, identifying undiscovered relations between concepts and enabling communication of concepts even where there is no commonality of terms. Agents will also dynamically adapt pages and add in links to related content - identifying connections between concepts. The 'intelligence' of this dynamic self-organising web, where popular links are prominent, and rarely used links will diminish (just like neurons), will gradually arise through the assembly of the limited intelligence of autonomous agents and systems, each operating within defined context parameters.

The Open Directory project's slogan is 'Humans do it better'. Tim Berners Lee's vision is that the semantic web will do it better ('it' being low-level information discovery and exchange), thus enabling humans to do better things. This symbiotic intelligence of people, plus computers, plus AI agents offering immediate access to humanity's collective knowledge does sound somewhat utopian. Equally it raises the spectre of a self-adaptive intelligence that quickly surpasses our ability to comprehend it. Would such a global brain act as a digital dictator to whom individuals have a secondary role to society's demands? Two papers published in the 9th September 99 issue of the scientific journal Nature, revealed the Internet appeared to be 'evolving' rather than following the expected model of random inanimate networks. Quoted in the June 2000 issue of New Scientist, Daniel Dennett, director of the Centre for Cognitive Studies at the University of Medford, Massachusetts, commented 'the global communication network is already capable of complex behaviour that defies the efforts of human experts to comprehend'. The semantic web may act as a 'collective memory' augmenting individual brain power and accelerating the pace of human learning and discovery, but we will need to careful about controlling its development and our dependence on it if wish to avoid a dystopian digital dictator scenario.

You Tube video
Tim Berners Lee talks about the semantic web (8.23 mins, July 07)


Related articles:
XML and Information Management
, Information World Review, Dec 01
Intelligent agents and peer-to-peer searching, Information World Review, Dec 00

Information World Review is Europe's leading information industry publication. This article is reprinted in its entirety with permission from Learned Information Europe Ltd. All material copyright Learned Information Europe Ltd.

Back to the top