Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: Re: Link Grammar and Parser




John,

I remember a talk on machine translation by Yorick Wilks
a while ago, in which he made an interesting observation:

  - Almost every theory of linguistics has been claimed as
    a basis for some MT system.

  - And the success or failure of the system for the purpose
    of MT is totally unrelated to the underlying theory.

This is a loose paraphrase, since I don't have a copy of
the original talk handy, and Yorick's tongue was partly in
his cheek (as it often is), but something along these lines
can be claimed for a lot of things, including parsers.

JB> ...  And if a grammar formalism has not faced the
 > computational crunch and come through, I would remain
 > skeptical.

For reasons similar to Yorick's, I believe that the theory
or formalism behind a lot of acronyms ending in G is less
relevant than the selection among a variety of computational
techniques.  Following is a nonexhaustive sample:

  - Top-down with backtracking or bottom-up, chart-style.

  - Form of output, such as parse tree, dependency graph,
    feature structure, or some combination or variation.

  - A basic context-free backbone (whether it is called
    context-free or whether it is written in a rule-like
    style is irrelevant) augmented with tests of various
    kinds, which may be called syntactic, semantic,
    pragmatic, or whatever.

  - Tradeoffs between the number of grammar rules (CF or
    otherwise) and the number of patterns or constructs
    associated with the words -- i.e., is it a complex
    grammar with many pages of rules or a simple grammar
    with most structural information in the lexicon.

  - Methods for keeping track of ambiguities and reducing
    or managing then by grouping, testing, marking, etc.

  - Use of background knowledge, corpora, statistics, etc.,
    and at what stage during the parse.

  - And most importantly, is the parser written by a
    superprogrammer or by a newbie who just learned
    language X?  And did the author spend many long
    hours working and reworking it on lots of texts?

There is certainly a connection between the choice of
formalism and the choice of computational techniques,
but it is not completely deterministic.  Just look at
the enormous number of published methods for parsing
context-free grammars.  Other formalisms might have
as many techniques if they had as many people working
on them.

JB> ... But surely in computer science and certainly in AI,
 > the 80s are a long time ago. There were just so many
 > different grammar formalisms being suggested in the 80s
 > because there was less understanding of grammar as
 > information and not as form.

On the contrary, I make the point that the overwhelming
majority of fundamental discoveries in computer science
and AI were made in the 1950s and 60s -- that includes
both hardware and software.  There certainly have been
many new variations on the same themes and a few major
inventions -- but not that many.  That list of parsing
techniques, for example, is mostly from the 1960s and 70s.

In hardware, the parallelism, pipelining, lookahead,
caches, virtual memory, etc., in the latest and greatest
Pentiums and Power PCs were invented in the 1960s -- the
last thirty years of progress was just following Moore's
law of making the circuitry smaller, faster, and cheaper.

In fact, the architecture of IBM's Power PC is based on
John Cocke's original RISC design of 1971 -- which he
based on the approach he took for IBM's first commercial
computers in the 1950s when circuitry was very expensive.

In programming languages, Simula 67 from 1967 was the first
object-oriented language, and it is still in some ways more
advanced than Java or C#.  Bit mapped graphics with pointing
devices were invented in the 1960s -- but they were too
expensive for most people.  Relational databases follow
Ted Codd's papers of 1969 and 1970, and object-oriented
DBs are a throwback to the earlier database technology
Codd was arguing against.

And the semantic web is nothing but RDF (1960s semantic
networks) and OWL (1970s semantic networks) expressed
in XML (a language based on the 1969 GML, but not as
flexible, powerful, or readable as the 1959 LISP).

John