Re: Fw: Intro to natural language processing
Rob,
To a certain extent, I agree with the principle
summarized in the title of your paper:
"Example-based Complexity--Syntax and Semantics
as the Production of Ad-hoc Arrangements of Examples"
Anyone who has worked with NLP knows that it is
not too hard to design a parser that can correctly
parse about 50% of the sentences in well-edited text.
But getting 80% right is much, much harder; getting
90% right is much, much, much, much, much harder;
and nobody, not even a well-trained human, gets 100%.
One reason why 100% is unachievable is that people
can create what I call "nonce grammar" dynamically.
Following is an actual example:
For this process the following transaction codes
are used: 32 - loss on unbilled, 72 - gain on
uncollected, and 85 - loss on uncollected. Any of
these records that are actually taxes are bypassed.
Only client types 01 - Mar, 05 - Internal Non/Billable,
06 - Internal Billable, and 08 - BAS are selected.
Source: http://www.jfsowa.com/pubs/gal4fmf.htm
Examples like this occur in specialized usage of any
kind, ranging from cooking recipes and sports scores
to articles in scientific publications and that popular
challenge for linguists, the Wall Street Journal.
But notice one very interesting feature: the author
who created this pattern used it quite consistently
over the scope of several sentnces. A "nonce" parser
might infer a special rule, such as
TransactionCode -> Integer "-" ShortPhrase
That is, in effect, what people who are familiar with
the subject matter do. Noun-noun constructions are
notoriously difficut to analyze without domain knowledge.
Following is another example in which the implicit
relation depends more on chemistry than linguistics:
a hydrochloric-acid wash
a polypeptide wash
Source: http://www.jfsowa.com/ontology/lex1.htm
In Wittgenstein's terminology, people create new "language
games" dynamically, whenever they find them useful. But
such constructions are not totally arbitrary, since they
commonly contain "chunks" that are recognizable constructions
from the parent language.
John