Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Fw: Intro to natural language processing



I found this message posted on comp.ai.nat-lang, and thought it
might be one person's good overview of NLP results to date.

Rich


Rob Freeman wrote:
> Andrew Wagner wrote:
>> Hi. I've done some other programming before (e.g. a chess engine), but
>> am unfamiliar with the terminology of NLP. Is there a web site that
>> can get me started? Andrew
>
> Rather than telling you to read a book, I think the most important
> thing to make clear right from the start when someone is beginning in
> NLP, is the fact that nobody really knows how to do it.
>
> So the first thing to realize is whatever of the established
> directions you start out in, it will be wrong.
>
> That has its advantages for a beginner, because it means you don't
> need to worry so much about all the experts. Being an expert in NLP
> mostly means someone has gone a lot further the wrong way :-)
>
> On the other hand if you know nothing about the wrong stuff you run
> the risk of repeating it, and you don't want to do that either.
>
> I would say the best idea is to step lightly about all the wrong
> stuff, trying to get an idea of why it is wrong (and bits are likely
> right, too), and keep thinking about what kinds of directions you
> might need to go in if you want to eventually find something which is
> right.
>
> Basically in terms of current wrong stuff you have two major choices:
>
> 1) You can play with rule-based systems: almost working for about 30
> years, and not getting any better fast.
>
> 2) You can play with statistical/data-based models: better optimized
> and more comprehensive simplifications (in the main) of rule-based
> systems, and producing slightly better results for the last 10-15
> years, but also not getting any better fast.
>
> Supposedly underpinning these two major NLP directions there are many
> (dozens?) of theoretical linguistic schools (though I'm not sure many
> NLP people give much credence to theoretical linguistics). The major
> threads here are:
>
> 1) Generativism (focus on language as system for producing strings)
> 2) Functionalism (focus on language as a system of contrasting
> strings)
> 3) Cognitivism (focus on language as a system which associates strings
> and meanings.)
> 4) Corpus (not so much a school of language, as a new resource, but
> not all schools accept that resource is meaningful; in particular
> Generativism traditionally thinks language cannot be learned from
> observations and must be innate.)
>
> In practice almost all NLP to date has been Generative in flavour
> (focus on language as system for producing strings). That doesn't mean
> Generativism is right, it is just more popular with computer
> scientists. Probably because it resembles the way they already know
> computer languages work.
>
> There has been a little associative stuff using Neural Networks, too
> (c.f. Cognitivism), but really hardly any.
>
> So, you can look at what has been done in NLP (the wrong stuff -
> typically you might write a little HPSG grammar, or train a little
> Hidden Markov Model) but I think the interesting challenges center on
> considering what is wrong with what has been done. What has been done
> in Linguistics is a guide here, but by no means and exhaustive guide.
> (Especially the way these theories have been developed in practice,
> rapidly moving away from their fundamental premises and occupying cosy
> niches where life is not too challenging.)
>
> To give you a taste of where I'm coming from, I think the Functional
> approach is the most ultimately promising of those currently
> available. (Though most of the way the school of Functional
> Linguistics has developed in practice is not of interest from an NLP
> point of view, a sort of Linguistics meets Sociology...)
>
> I've played a little implementing NLP systems in terms of the
> fundamental assumptions underlying Functionalism (focus on language as
> a system of contrasting strings). I think it is less wrong than the
> others. There's still plenty of scope left for error though.
>
> Anyway, if all this doesn't make the direction you should take any
> clearer I hope at least it gives you some idea how you should look for
> it.
>
> -Rob Freeman