David Green BA (Hons), PgDipLIS, MCLIP    
Home CV

Publications

Research Papers Events Useful Links

Intelligent agents and peer-to-peer searching

Information World Review - Dec 00

My tag cloud - semantic web and ai links

Introduction | Algorithms and agents | Privacy and verification | Peer-to-peer searching

As the Internet search genus continues to evolve, it has been speciating into new forms; meta-searching, popularity-based searching, link-analysis searching, natural language searching, search utilities and so on. This diversity and continuing specialisation is an indicator of what we could expect in the near future.

The shape of things to come
Search engines will continue to move away from interpreting statements to identifying the concepts that underlie the user's requests - such natural language searching points towards the day when voice-based searching will become commonplace. In the meantime, classification schemes will be increasingly used to refine users' keyword searches, suggesting alternative words or phrases to the user when presented with ambiguous terms. Specialisation, either through technological innovation or by type of content indexed, will continue apace. For example, many search engines cannot index PDF files, so Adobe developed a PDF search engine.

No longer a simple issue of which search engine is the best or biggest, effective web searching now depends on choosing the appropriate content sources and search tool according to the requirements of each query - not unlike 'traditional' online business and scientific searching. For example, search utilities enable automation of many aspects of gathering market and business intelligence; running saved search statements on a regular, allocated basis, providing alerts of changes to particular sites that you are monitoring etc.

Insider searcher
Better algorithms and agents are delivering many of the current and forecast improvements in web searching and information delivery. Whilst I covered intelligent agents in-depth in IWR issue 158 (May 1999), search agents can learn from our behaviours and independently suggest new information sources to us, delivering truly 'proactive personalisation'. They deliver much of the functionality behind natural language searching and can determine not just the concept behind a sentence/phrase, but also identify other related concepts and links between those. This, combined with their ability to interrogate far more data than a single human mind could, potentially opens the door to an explosive growth in human learning. New relationships between different concepts can be unveiled through the examination of mind-boggling amounts of information and the correlation of any links within. back to the top

Double Agents
However this utopian weltanschauung is not without complication. There is a growing need for a public debate on the autonomy versus the accountability of intelligent agents. Just how far should their autonomy extend? In a near-future web crawling with intelligent agents, many of which will 'chat' to one another and share information, what will be the boundaries to regulate how much information about our behaviours they digress to other systems, databases and agents without our knowledge?

Agentware developers would be wrong to dismiss such privacy concerns, preferring instead to will the ascendancy of agents on the web through technological prowess alone. Besides, on this latter point agents have an Achilles' heal; like all computer-based systems, they are critically dependent on clearly defined parameters of operation. Once removed from this defined context their effectiveness collapses. Such neatly defined context can't mimic real-world complexity and agents can never therefore truly attain the personal levels of discrimination that we exercise for ourselves. After all, humans can be wonderfully whimsical and circumstances are never truly predictable. Mike Lynch of, ahem, Autonomy, was wrong when he flatly commented 'Search is dead'. Sergey Brin of Google is perhaps more astute when he commented that 'people will search more and more, because the diversity of content requires it'. Remember - the search genus is evolving, providing a variety of options to the searcher. Privacy is already an issue and in an online world where much crime will migrate from the theft of artefacts to theft of identity details, it will become vitally important. Agentware developers would do well to incorporate robust, user-determined levels of personal security settings that dictate what information the agents we use can digress, and to whom. Naturally we would want different security/privacy levels according to which sources we were using. Otherwise concerned users will simply opt for other search tools - such as the newly emerging distributed search applications. back to the top

Peering through a window to the future
Just as the nascent explosion of the web was enabled by open IP standards, it's ability to self organise with agents will be dependent on the presence of open standards and common protocols. Whilst concerted corporate alliances such as Symbian seek to determine (and thus control) such future open standards, it is likely that the open source movement that brought us Linux, Gnutella and Napster will have a greater impact. The Internet was originally peer-to-peer but it became 'centralised' when it became commercialised. However, peer-to-peer computing, information distribution and indeed infrastructure could be the future.

Almost all of today's search engines are based on a centralised client-server model; many users interrogate a single mainframe server (or proxy copies). Even though FAST uses parallel processing, it nonetheless falls under the 'centralised' category, as there is no decentralised user-to-user interaction. There are several major drawbacks to this prevailing approach:

  • spiders cannot index databases that are integrated to websites
  • they index varying, and often small, percentages of web
  • search engine indexes often contain out-of-date pointers to content i.e. dead links
     

Also referred to as distributed file searching, peer-to-peer searching allows users to find the information they require on other user's computers, which provides numerous advantages:

  • Extends the range of searching beyond webservers to the millions of individual computers connected to the Internet at any one time
  • The search is run in a 'real-time environment' so only information that actually exists on other computers will be listed in the results - so no dead links
  • A greater range of file types can be searched - including dynamic data in databases
  • Suitable for locating volatile or temporary information

The downside is that, as the search is running in a 'real-time environment' across many computers, rather than a single search engine database, users will wait much longer for results to appear in their browser - often over a minute. This timescale is unlikely to reduce until broadband access becomes much more widely available. The programmes that enable such peer-to-peer searching are a new category of search utilities that are referred to as distributed search applications. Examples include Pointera and InfraSearch.

Whilst big brand names such as Yahoo! and Amazon will continue to pull in new users on the basis of the trust their brand name implies, web searching is still a wide-open field. Relative newcomers such as Google, then FAST and later WebTop have all demonstrated that it is possible to rise to prominence very quickly. Whilst I have outline some of the possible future developments in this article, the rest may yet prove to be anyone's guess…..

Related article: The semantic web, Information World Review, Dec 02

Related material: The evolution of web searching - Section five - Search utilities and Intelligent agents

Information World Review is Europe's leading information industry publication. This article is reprinted in its entirety with permission from Learned Information Europe Ltd. All material copyright Learned Information Europe Ltd.

Back to the top