Re: Fw: Intro to natural language processing
On Sunday 02 January 2005 03:54, John F. Sowa wrote:
> To a certain extent, I agree with the principle
> summarized in the title of your paper:
> "Example-based Complexity--Syntax and Semantics
> as the Production of Ad-hoc Arrangements of Examples"
> Anyone who has worked with NLP knows that it is
> not too hard to design a parser that can correctly
> parse about 50% of the sentences in well-edited text.
> But getting 80% right is much, much harder; getting
> 90% right is much, much, much, much, much harder;
> and nobody, not even a well-trained human, gets 100%.
> One reason why 100% is unachievable is that people
> can create what I call "nonce grammar" dynamically.
> Following is an actual example:
> For this process the following transaction codes
> are used: 32 - loss on unbilled, 72 - gain on
> uncollected, and 85 - loss on uncollected. Any of
> these records that are actually taxes are bypassed.
> Only client types 01 - Mar, 05 - Internal Non/Billable,
> 06 - Internal Billable, and 08 - BAS are selected.
> Source: http://www.jfsowa.com/pubs/gal4fmf.htm
I like "nonce grammar". I take a more extreme position, though. I think every
new sentence is a "nonce grammar" to an extent (unless it is used more than
once and becomes a "term" or an idiom, even, eventually, a word.) The
interesting question is how we create them (as John Bateman pointed out.) My
answer is of course "ad-hoc arrangements of examples." The regularity of
phraseology and syntax for a given language are just symptoms of the tendency
for the same examples to line up repeatedly in (roughly) the same way.
The cognitive innovation made by humans when they first used such examples as
symbols for other things must, of course, have also been of great importance.
Symbolism may be the uniquely linguistic behaviour. But underneath it, and
acting on it, I think we have to recognize generalization over contrasts, as
fundamental to the concept of meaning itself (which necessarily underlies any
concept of "symbol") and, crucially, as the engine of _new_ meaning, which is
expressed linguistically in syntax.
As a practical matter, I think we can map habits (frequency of association) in
texts to find words, idioms, and by association with meaning, symbols. But we
must continually generalize over contrastive patterns in texts to generate
(or analyse) new (nonce) texts, and the new meaning for which they can be
used as tokens.
> But notice one very interesting feature: the author
> who created this pattern used it quite consistently
> over the scope of several sentnces. A "nonce" parser
> might infer a special rule, such as
> TransactionCode -> Integer "-" ShortPhrase
> That is, in effect, what people who are familiar with
> the subject matter do. Noun-noun constructions are
> notoriously difficut to analyze without domain knowledge.
> Following is another example in which the implicit
> relation depends more on chemistry than linguistics:
> a hydrochloric-acid wash
> a polypeptide wash
> Source: http://www.jfsowa.com/ontology/lex1.htm
But context can change that: "let the polypeptide wash away(?)" (c.f. "pour
the polypeptide wash away.") Domain subjectivity is one thing, but syntax can
create new meaning within the domain too.
> In Wittgenstein's terminology, people create new "language
> games" dynamically, whenever they find them useful. But
> such constructions are not totally arbitrary, since they
> commonly contain "chunks" that are recognizable constructions
> from the parent language.
They not only contain chunks, they modify them. That is the problem, how to
model when chunks change (or occur in novel contexts.)
You can make rules, but in general every context is unique, and highlights
unique regularities in the language. We need to stop fighting that and start
seeing it as an advantage (and a model for "meaning as perspective".)
By limiting our model of meaning to symbols we have been missing the
flexibility of language.