Semantics, XML, and XQuery
Some of my comments about the semantic web have been considered negative.
So I am happy to report some remarks that make me sound moderate in
comparison. These remarks, by the way, are primarily about XML and how
its syntactic features have perverted the semantics of databases and
other systems. That is also my major complaint the semantic web: the
emphasis on XML syntax has perverted the semantics.
Some criticisms come from some of my former colleagues at IBM, Chris Date
and Hugh Darwen, and their associates in the database field. Although
Chris's books have done a great deal to popularize SQL, Chris himself has
been highly critical of the many weaknesses of SQL. Those weaknesses
are the result of ill-advised deviations from the original logic-based
architecture by Ted Codd. The QUEL language, which Stonebreaker designed
for Ingres, was far simpler and more powerful because it was far more
faithful to Codd's original semantics. Unfortunately, the marketing
power of IBM and Oracle rammed SQL down our throats.
Now that SQL has become "Intergalactic Dataspeak", as Stonebreaker calls
it, we have no choice. But instead of correcting its mistakes, the folks
who brought us SQL are creating an even more debased language called
XQuery. Following is an article about XQuery by Fabian Pascal, a
colleague of Chris Date:
http://www.dbazine.com/pascal19.shtml
Some excerpts:
FP> As an intersystem, serialized data-exchange format, XML tags are
> "syntactic"; interpretation and manipulation -- semantics, that is
> -- were supposed to come from outside XML, namely from existing and
> future database systems and applications. Hence, the objective was
> a format independent of any particular language or application and
> easily extensible to new and unanticipated kinds of information.
Some points to note:
1. XML evolved from notations for tagging text files (GML, SGML, & HTML),
and they were applied to the task of "serializing" structured data;
i.e., mapping the structure into a linear form that could be sent
across a transmission line. Although that is useful, it is a purely
syntactic transformation. There is nothing semantic about it.
2. The original intent of XML was to provide a "language independent"
approach that could be adapted to any semantics whatever. That is
a laudable goal. But now the XQuery people have perverted that goal
by changing the semantics to make the query language "XML friendly".
3. As an example, Fabian P. quotes Don Chamberlin, the original villain
who destroyed the logical integrity of SQL: "... if an application
is viewed as a source of information in XML format, it is logical to
pose queries against that XML format. This is the basic reason why
a query language for XML data is extremely important in a connected
world." In other words, Chamberlin wants to destroy what little
semantics is left in SQL by forcing it into the XML syntax.
FP> Querying is a semantic data management function, not data interchange
> (syntactic) function, but one wonders:
>
> * If XML's point is to be language-independent, why an XML-specific
> language?
>
> * If XML's point is to be database-independent, why reinvent the data
> management wheel (and, we shall argue, a "square wheel" at that?)
>
> * If XML is for syntactic interchange, can it be used for semantic data
> management?
>
> * If it can, should it be? Is it cost-effective?
>
> These are the questions that anybody who considers extending XML to
> data management should have addressed upfront. W3C obviously did not.
Don C> The XML Query Working Group undertook to define a language with
> two alternative syntaxes: a keyword-based syntax called XQuery,
> optimized for human reading and writing, and an XML-based syntax called
> XQueryX, optimized for machine generation. This chapter describes only
> the keyword-based XQuery syntax, which has been the major focus of the
> working group."
>
FP> Given that XML was invented expressly for inter-system exchanges,
> one must also wonder what's the point of "human reading and writing"?
> Chamberlin doesn't seem to distinguish between a serialized data
> exchange format for systems and a database structure for logic
> inferences. Had he understood the distinction, perhaps he would have
> given us a truly relational, well-designed data language for humans,
> not SQL or XQuery.
I agree with FP. My recommended human format is controlled natural
language, and my recommended machine-oriented format is first-order
logic. On a related topic, I recommend another article, "The Myth of
Self-Describing XML," by Eric Browne:
http://www.oceaninformatics.biz/publications/e2.pdf
In his conclusion, Browne summarizes the good features of XML:
EB> XML is very useful for describing the structure of data streams; it
> can tag individual items of data; it can qualify such items; it can
> associate individual items with others in the stream; it can group
> items together; and, augmented with XML-Schema or its alternatives,
> it can allow for constraining, validation, data-typing and other
> structural niceties, that hithertofor have been difficult to achieve
> across a range of operating systems and APIs (Application Programming
> Interfaces). There are many tools available to developers. It looks
> somewhat like HTML. It is supported by a well-funded independent
> international standards organisation. In short, XML is a very useful
> and widely adopted technology.
But those are syntactic features that provide no semantic guidance:
EB> Adoption and acceptance of a technology, when it reaches some
> critical threshold, automatically induces further acceptance,
> irrespective of the merits of the technology. One invariably hears
> "We should adopt this technology because it has become a de facto
> standard". But what is XML a de facto standard for? What should
> we adopt it for? It may well be appropriate for formating and
> serializing data streams for exchange between systems, but, by
> itself, it certainly is not adequate for semantic interoperability
> amongst heterogeneous systems devoid of a common conceptual model
> of their domain.
This is the issue that Don C. and his ilk fail to understand: they
are trying to derive a "conceptual model" from nothing but syntax --
a misguided and hopeless endeavor.
Even at the syntactic level, XML has serious limitations. As a notation
for tagging texts, GML and SGML imposed very little overhead on storage
space or processing time. But when the amount of tagging overwhelms
the data, the verbosity of XML makes the language humanly unreadable
and computationally inefficient. See the article by Larry Seltzer:
http://techupdate.zdnet.com/techupdate/stories/main/0,14179,2896005,00.html
LS> I don't know about you, but I'm scared. XML is becoming complex, and
> it was an inherently fat way to represent data to begin with. In fact,
> according to the W3C (the keepers of the Web), "XML is verbose, but
> that is not a problem." This was a conscious decision by XML's
> designers! ... XML is inefficient in three major ways: It uses lots
> of bandwidth, lots of storage, and lots of processing power.
And following is an article by Antone Gonsalves about the "hot market"
for tools to accelerate XML:
http://www.techweb.com/wire/story/TWB20040408S0010
XML-Acceleration Tools Aim To Speed Web Services-Clogged Networks
AG> Key to the bottleneck that can accompany the use of web services is
> extensible markup language. While XML is the technology's greatest
> strength, it can also be its Achilles' heel... In justifying the need
> for its product, Conformative released this week results of a survey
> of 30 Fortune 500 companies. All the participants in the study, which
> was done with consulting firm ReSolution Market Research, had
> experienced performance problems in large-scale XML projects....
>
> Those measures could fall short, depending on the size of the projects;
> and a separate hardware appliance may be necessary, experts say.
Unbelievable. In the 1970s, I was happily running GML with a hundred
other users of a time-sharing system that had a tiny fraction of
the power of my laptop computer today. But now the "experts" think
we need a "hardware appliance" to speed up XML.
John Sowa