SUO: Informal definitions of ...
Hi All,
This list has been too quiet for too long, so I thought
maybe I could stir up a topic related to the CLCE
paper John has underway.
"What are the differences between a vocabulary,
a taxonomy, a thesaurus, an ontology, and a
meta-model?" is at:
http://www.metamodel.com/article.php?story=20030115211223271&mode=print
I thought it was a rather clean description
of the differences.
The notion of monosemy in a controlled language
bothers me. I'm looking for ways to disambiguate
polysemous words, now that I have a relational
database of WordNet synsets.
One thing that suggests itself is to use
paraphrasing of words in a synset. That led
me to think about tools to automatically
generate paraphrases from a corpus.
With a data structure of sentences, using
variables to stand for specific verb, preposition,
noun and so on, it might be possible to represent
most of commonly used sentences in a searchable
format using recursive frame analysis.
For example, the free link parser generates
the English parse tree from many well formed
sentences. Using that parse tree to gradually
fill in terminal vocabularies, it might be
feasible to automatically generate the data
structure so that equivalent semantic meanings
are factored into something like Beth Levin's
English Verb Classes and Alternations.
From a google on automatic paraphrase, I found
some interesting papers:
"Verb Paraphrase based on Case Frame Alignment":
http://citeseer.nj.nec.com/cache/papers/cs/26660/http:zSzzSzacl.ldc.upenn.eduzSzPzSzP02zSzP02-1028.pdf/verb-paraphrase-based-on.pdf
is a paper that develops math expressions for choosing
case frames by verb, including a dictionary and using
polysemy to do the paraphrasing.
"Automatic Paraphrase Acquisition from
News Articles" identifies named entity
recognition as a constant in paraphrasing.
http://nlp.cs.nyu.edu/publication/papers/shinyama-hlt02.pdf
Paper on automatically paraphrasing questions -
defines 12 different types of questions that can
each be paraphrased differently.
"Selecting Features for Paraphrasing Question
Sentences" has a pdf link at this url:
http://condor.depaul.edu/~ntomuro/research/paraphrase.html
"Detection of Transfer Errors in Automatic
Paraphrasing" is a discussion of using positive and
negative examples to train a paraphraser to not make
common errors, such as case errors:
http://iplab.aist-nara.ac.jp/coe2003/abstracts/Orals/Atsushi_Fujita.pdf
"Learning to Paraphrase: An Unsupervised Approach
Using Multiple-Sequence Alignment" is an excellent
paper, and even includes an Appendix describing the
algorithm used to produce a data structure of paraphrases:
http://www.cs.cornell.edu/home/llee/papers/statpar.pdf
"Discovering and Comparing Topic Hierarchies" is
a very good paper about how topics can be extracted
from texts (sentences? paragraphs? ...) using a simple
Bayesian expression for coocurring phrases. Well done:
http://ciir.cs.umass.edu/pubfiles/ir-183.pdf
"Generating Hierachical Summaries for Web Searches"
is a paper by the same two authors as above:
http://ciir.cs.umass.edu/pubfiles/ir-271.pdf
"Interactive query expansion: a user-based evaluation
in a relevance feedback environment" is a long paper
with lots of material about testing for relevance, grouping
by topic similarity and so on:
http://faculty.washington.edu/efthimis/pubs/Pubs/iqe-jasis/iqe-jasis.html
"Automatic Item Generation via Frame Semantics:
Natural Language Generation of Math Word Problems"
http://www.ets.org/research/dload/ncme03-deane.pdf
is good for generating questions and answers using
templates. It discusses "interslot dependencies" such
as "drive a car", "ride a bike", "fly an airplane", and
so on.
"Discovery of Inference Rules for Question Answering"
http://www.cs.ualberta.ca/~lindek/papers/jnle01.pdf
presents an algorithm (DIRT) that uses dependency trees
from the parser (i.e., syntax trees) to find inference
rules. It presents ways of calculating similarity measures
at various levels that might be very useful.
Rich