-----------------------------------------------------------------------
BROAD AREA COLLOQUIUM FOR
AI-GEOMETRY-GRAPHICS-ROBOTICS-VISION
-----------------------------------------------------------------------
Processing Natural Language without Natural Language Processing
Eric Brill
Microsoft Research
Monday, June 3rd, 2002, 4:45PM
Gates B01
http://robotics.stanford.edu/ba-colloquium/
Abstract
Despite decades of research and development, we can still only create
machines with the most rudimentary natural language processing
capabilities. One of the greatest barriers to advanced natural
language processing is our inability to overcome the linguistic
knowledge acquisition bottleneck. Language appears to be extremely
complex and idiosyncratic. Over the years, there has been an ongoing
debate as to how best to overcome this bottleneck: via better
linguistics or more powerful machine learning. While we have been
debating, the amount of on-line text has ballooned from the ubiquitous
million-word Brown corpus to close to a trillion words accessible on
the Web. Does this change everything? We will describe recent work in
a number of areas, including automatic question answering, automatic
training of grammar checkers, and language modeling, where state of
the art accuracy is achieved using very simple methods whose power
comes entirely from the plethora of text currently available to these
systems.
About the Speaker
Eric Brill is a Senior Researcher in the Machine Learning and Applied
Statistics Group, of Microsoft Research. His research interests include
machine learning, string algorithms, natural language processing and
information retrieval.