Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: Re: CG: Graph databases




Hi "theoreticians",

I can only concur with Philippe statements below and even add that
the case is probably much worse than he warns.

Whether it is for handling CGs or any other kind of knowledge base or ontology 
a most important component of the system amounts to an "inference engine" 
no matter what you call it and THIS is the limiting factor of the system, 
the CORE of it, as John Sowa noticed in :

   "http://suo.ieee.org/email/msg06369.html";

If you don't have a "state of the art" inference engine you end up

> trying to prove that China isn't a soft drink 

(Bill Andersen: "http://suo.ieee.org/ontology/msg04155.html";)

To examplify the difficulties I will give you an account
of WHAT is the current "state of the art".

Currently being busy hacking the sources of Otter
("http://www-unix.mcs.anl.gov/AR/otter/";), I noticed that one of the 
current participants in the latest CADE ATP System Competition:
("http://www.math.miami.edu/~tptp/CASC/18/";)
is still using the Otter system *AS IS* as a basis of his own prover:
("http://www.math.miami.edu/~tptp/CASC/18/SystemDescriptions.html#SCOTT";)

Unfortunately, if Otter was among the top performers only 6 years ago
it is now a has-been only used as a convenient yardstick to gauge
the level of other contenders.
Consequently the SCOTT prover, in spite of clever attempts 
at "semantic guidance", gets ridiculous results:
("hhttp://www2.cs.man.ac.uk/~tptp/ResultsPlots.html";)

This is to to point that the very basic engine and data strutures in
such applications have a DECISIVE importance.

And you are looking forward to use a RDBMS of whatever flavor to store
your ontology or knowledge base?

Get real gentlemen, if you don't get closer to the CURRENT "state of the art"
your systems will just be *crap*, USELESS crap!

I can assure you that NO inference engine of realistic performance
can be built on top of an SQL driven database.

Don't get me wrong, I am not saying that "vanilla" KB and ontologies
should play in the same league than the CADE ATP contenders.
But many "everyday" queries are under the risk of combinatorial 
explosion (China isn't a soft drink, or is it? and what does this 
have to do with someone's age query?), and THIS is the difficulty
to be dealt with, not to be shyed away.

Of course this is not the "well received", "Tim's blessed" trend.
It is just the opposite, choking the capabilities of the knowledge 
representation to meet the poor capabilities of "archaic" inference 
engines (read again and again Philippe's remarks below).

This is a DEAD END.

Goodbye...

-- Jean-Luc Delatre
-----------------------------------------------------------------------------
"The fact that an opinion has been widely held is no evidence whatever that
 it is not entirely absurd, indeed, in view of the silliness of the majority
 of mankind, a widespread belief is more likely to be foolish than sensible."
        -Bertrand Russell 
-----------------------------------------------------------------------------
 http://perso.club-internet.fr/jld/  -- GSM: +33 6 11 24 06 29

Philippe Martin a écrit :
 
> About the discussion on RDBMSs for storing CGs:
> I fail to see the interest of using RDBMSs (as opposed to OODBMS,
> deductive DBMS, ...) for developping reasonably complex/expressive
> KR systems. Indeed, structures have to be flattened and scattered to
> fit the RDBMS model and this reduces understandability, development,
> maintenability and, very often, performance.
> I do not understand the argument "we'd better use SQL-accessible databases
> because SQL is the most common access language" because, given the very
> complex and ad-hoc way the structures have to be flattened and scattered,
> I do not think that other RDBMSs will be able to access and navigate the
> structures. Only reading the database schema is insufficient to understand
> how the structures are stored. The SQL queries to access the structures are
> complex and need to be generated (as is the case in Bernd Groh's approach)
> in accordance with the ad-hoc way the structures have been stored. This
> requires a lot of implementation work. Hence, no other system is likely
> to access the database.
> Even with an OODBMS, when the structures to store are complex, the storage
> has to be complex/ad-hoc, and the database is unlikely to be exploitable by
> another KR system exploiting tthe same OODBMS. For example, see the data model
> I use (or used since I have now extended it) and which is a minimum for
> storing representations of natural language sentences
> http://www.webkb.org/doc/dataModel.html
> In this model, the (extended) C++ classes are nearly sufficient to explain
> the storage of the ontology part of the KB (types and links). For the storage
> of the graph part (nodes and relations), an additional documentation is needed
> (but not yet written).
> 
> The RDF model is a set of triples. It is clearly designed with the aim of
> permitting a straightforward storage in a RDBMS, and Tim Berners-Lee
> acknowledged this at WWW 2002. The result is that
> (i) RDF is quite knowledge format that is difficult to use and extend for
>     storing reasonably complex knowledge representations in a coherent way,
>     and this is due to the triplet approach, e.g. the multiple reifications
>     that it imposes for storing contexts, sets, quantifiers, ... (how to
>     do them? how to combine them? what does that mean? ...),
> (ii) since it is a very low-level model, there are many, many ways to
>      represent the same thing (e.g. many ways to reify), most of which
>      are very difficult to compare automatically (and hence to retrieve
>      and exploit for logical inferencing).
> (iii) triplets do not even help knowledge storing in RDBMSs since other
>      structures are required for performance reasons. Quadruplets, quintuplets,
>      sextuplets, ... are necessary for storing contexts, sets, quantifiers, etc.
>      That's the case in Bernd Groh's system where a reference to a context node
>      can be associated to each concept node, and to each type (or instance?)
>      is associated its transitive supertype coverage.
>      Current RDF database systems seem to use RDBMSs but are excessively simple.
>      They do not even seem to exploit the subtype relationships when answering
>      queries, and during the demos at WWW 2002, this feature was not mentionned
>      (not even as a feature to implement in the future!). It seems clear to me
>      that, in the context of storing and retrieving large amount RDF-like
>      structures, Bernd's system is way ahead of the systems of the people
>      related to the W3C. This does not contradict the fact that I find his
>      approach, or any other RDBMS-based approach, inadequate and unscalable
>      for complex knowledge representations; Bernd acknowledged it for "very
>      complex representations" :-).