|
Information World Review
- Dec 01
My tag cloud
- XML
Introduction | XML
defined | Information vendors | Info
Management | Semantic web | Records
management
HTML is DEAD. There have been no new standards
developments to HTML for some time now. It is being replaced by XML.
Whilst easy to learn, HTML was limiting in that it only addressed the
design and layout of information, and not its meaning. Given that most
Internet users and systems are primarily concerned with information
retrieval and exchange, this limitation was quite a handicap.
The migration from HTML to XML as the de facto
web publishing mechanism will have far reaching implications for information
professionals and publishers alike. Originally I was briefed to write
an article on why technologists have dominated XML developments to date
with lesser input from information specialists. In researching this
article I quickly came to the conclusion that this wasn't really such
an issue - much work to date has been on the development and agreement
of open standards. In recent months the World Wide Web Consortium (W3C),
an international industry consortium that sets open standards for the
web, has finally rubber-stamped the remaining related publishing and
linking standards that complement XML. Now XML will move into a new
phase - wide-scale implementation. It is here that information professionals'
skills in information management, classification schema and indexing,
search skills and records management will be called upon.
XML defined
XML is a semantically focused open technology that allows far greater
possibilities than mere metadata. Not only does it enable explicit description
of the content, but through related technology standards (more below)
it allows manipulation of how data should be formatted and output. This
takes the web page beyond a flat display of data and allows the user
to manipulate the data. Every major player in the technology industry
is touting this XML-driven interoperable future. Indeed, in the database
arena, XML has already become the standard approach for distributing
data from one application to another. back
to the top
Business information
vendors
For the business information industry, which has witnessed consolidation
into three mega-players; Thomson Financial, Reed Elsevier (owners of
Lexis-Nexis) and Factiva, distribution has become a key area of competitive
edge. We no longer think about internal and external sources - the goal
is to seamlessly aggregate these into unified information - the Enterprise
Information Portal e.g. Factiva Select is an XML content feed allows
corporate customers to host and integrate news into their intranet environment.
In many ways MAID's LiveIntranet product pre-dated this - however that
was fundamentally flawed in that it was based on their proprietary InfoSort
indexing technology and not on open standard technology such as XML.
Nobody wants to be locked into a single supplier if it can be helped.
Information management
Information management can be said to follow a cycle of discovery, acquisition,
cataloguing and dissemination. XML content management systems (e.g.
Interwoven) will allow information managers to centrally manage independent
content stores. Data can be pulled from several sources, aggregated,
and documents (web page or other format) generated 'on the fly'. Agent-based
indexing and retrieval tools such as Autonomy can also add value by
identifying related terms within and between documents and data sets,
and automatically generate XML-based hyperlinks. Just as XML is a technology
standard, there is much scope for it to also become a knowledge management
standard. For example, a taxonomy would be integral to supplying the
rules for automatically XML tagging internal data. back
to the top
The inter-operable semantic
web
Although XML tags content that scripts can then manipulate in complex
ways, until recently the system interrogating the data needed to know
what each tag is used for. In other words XML allowed users to add arbitrary
structure to documents without saying what that structure meant. This
has been resolved with the W3C issue of XML Schema - these will define
shared mark-up vocabularies and provide hooks to associate semantics
with them.
To re-iterate, the central tenet of XML is
that it addresses semantics. Tim Berners-Lee, a director of W3C and
often referred to as 'the godfather' of the Internet, has been working
on 'the semantic web', which he describes
as an extension of the Internet as it is today. The semantic web will
allow programs to browse around and exchange data without human intervention,
in effect turning the Internet into a single giant computer. Microsoft
is also placing a multi-million dollar bet on this vision of the near-future
inter-operable Internet with its .NET project. This will allow for the
automatic exchange of content and messages between software programs,
applications and databases and, where appropriate, towards people. Clearly
this raises the requirement for verification and authentication of information
sources in order to address data security and personal privacy concerns.
XML Schema will allow for better validation and assurance in information
exchange (e.g. e-commerce transactions) through digital signatures and
other verification tools. back
to the top.
Publishing formats and
records management
Again, another recently W3C issued standard has resolved the other outstanding
impediment to XML's generic adoption. Extensible Stylesheet language
(XSL) makes complex formatting of documents possible. This allows authors
to write once and publish many times and to many platforms e.g. different
content formatted for print, web and mobile channels. In the future
documents will be nebulous entities generated 'on the fly'. Automated
personalised editions could be created for each customer. This may allow
for the optimal storage of data but virtual data repositories used to
generate multiple documents raises a records management issue. Like
any other electronic document management system, there will be a need
to save transactional documents for legal, regulatory or business purposes,
as opposed to saving the base data elements. These documents must be
as accessible as today's hardcopy docs. Again this is another area of
XML implementation that information professionals are best placed to
address for their organisations.
XML also augments developments with peer-to-peer
computing and information exchange and has clear ramifications for Internet
search engines. Whilst there hasn't been room in this article to explore
these issues, you may also find it useful to read a previous XML article
that I wrote in the Feb 99 issue of IWR ('Here come the X files') which
is available from the archive at the IWR website.
Related material: Section six of research
paper 'The evolution of web searching' examines XML
and web searching
Related articles:
The semantic web, Information World
Review, Dec 02
Here
come the X files, Information World Review, Feb 99
Information
World Review is Europe's leading information industry publication.
This article is reprinted in its entirety with permission from Learned
Information Europe Ltd. All material copyright Learned Information Europe
Ltd.
|