Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: Re: Monosemy in a controlled language, disambiguating word senses[was: Informal definitions of ... ]




Whoops, I didn't mean to take this discussion off-list,
I hit the wrong reply button.

Added a URI to Copestake and Briscoe on polysemy
and TELIC qualia below.


Richard Cooper wrote:
> Hi Fred,
> 
> Frederick B. Kintanar wrote:
> 
>>Richard Cooper wrote:
>>
>>>The notion of monosemy in a controlled language
>>>bothers me.  
>>
>>Please try to explain what bothers you.
>>What intuitions are unsatisfied with
>>matching one vocabulary term with one
>>unit of meaning?
> 
> 
> 
> Monosemy is just too rigid to feel like
> normal language.  If I use anything even
> approaching the variation in daily conversations,
> it violates the monosemy rules in John's CLCE.
> 
> Consider "a pretty little girl's school", sort
> of a benchmark phrase in that it can be interpreted
> in many different ways, depending on where you
> place the accent.  
> 
> Or what about Chomsky's counterexample: "Colorless
> green ideas sleep furiously."  Clearly not a 
> natural sentence, yet grammatically correct from
> a purely syntactic point of view.  
> 
> My rough vision is that most daily language
> is repetitive, with moderate variation.  Published
> language is more variable in fiction and much
> more descriptive.  Journal language is precise,
> with long sentences, and lots of conditioning
> of cases.  None of these fit well into a monosemic
> framework.  
> 
> 
> 
>>>I'm looking for ways to disambiguate
>>>polysemous words, now that I have a relational
>>>database of WordNet synsets.  
>>>
>>>One thing that suggests itself is to use
>>>paraphrasing of words in a synset.  That led
>>>me to think about tools to automatically
>>>generate paraphrases from a corpus.
>>
>>I would be interested in characterizing
>>the different senses of a *verb* in terms
>>of type constraints on "participant roles"
>>as evidenced in sample sentences that
>>exemplify that sense.  
> 
> 
> Yes, and if the sample sentences are generated
> from a single initial seed sentence, an observer
> can mark which ones don't make sense, and which
> ones do.  
> 
> Normal conversation uses only a few thousand
> words (other than names).  If we could get 
> those few thousand words into a data structure
> of sentences that can be efficiently searched,
> we could make a better syntactic parser with
> the first glimmer of semantic filtering.  
> 
> 
> 
>>I think we would
>>need to constrain both the thematic
>>roles themselves (and John S's proposal
>>in http://www.jfsowa.com/ontology/thematic.htm
>>seems an excellent starting point) and
>>the types of the participants. I'm not
>>sure how much the FrameNet work already
>>addresses this idea.
> 
> 
> I haven't seen much published from FrameNet,
> though someday maybe a good tutorial will
> come out of that group on how FrameNet is
> organized.  
> 
> John's thematic roles sound like a good
> starting point, since they seem to be role
> names often used by linguists.  But as a
> nonlinguist, I would like to see a clear
> definition of each word; John's treatment
> assumes you already understand each of
> the role words.  As I study the cases
> more, maybe I'll become familiar with 
> them.  
> 
> 
> 
>>This would produce something like a
>>semantic treebank of exemplifying
>>phrases (perhaps based on simplified
>>sentences with just the head words
>>of the noun phrases, but adding
>>some implicit information about
>>the situation being mentioned, enough
>>to distinguish this word sense from
>>others).
> 
> 
> Yes, phrases that are indexed by WordNet
> synsets could make a searchable data 
> structure of example phrases, including
> sentence forms with variables that can
> represent the noun phrase, verb phrase,
> and other variations in a corpus.  
> 
> Then there's still the issue of how
> to choose one interpretation when
> multiple interpretations are possible.
> I've glossed over that issue because
> I don't yet have the data structure to
> experiment with, but that will be a
> tricky problem.  
> 
> 
> 
>>For nouns, qualia like TELIC (which
>>constrains the meaning of nouns that
>>refer to artifacts in relation to some
>>purpose, typically characterized by
>>a verb that fulfills that purpose) might
>>provide a comparable degree of resolution.
> 
> 
> WordNet has no definition of TELIC, and
> a google search brings up few good sounding
> hits.  Do you have a URL that might be
> good for explaining TELIC to me?
>
http://www.cl.cam.ac.uk/~aac10/papers/jsem.pdf

Ann Copestake and Ted Briscoe, 1995. Semi-productive
Polysemy and Sense Extension  (PDF) (Revised version
of ACQUILEX II WP NO. 23) Journal of Semantics, 12, 15-67.

This was also published in _Lexical Semantics:
the Problem of Polysemy_ edited by James Pustejovsky
and Branimir Bofuraev, Clarendon, 1996.

> I did find a URL that seems to contradict
> what I've said above:
> http://www.cogsci.ed.ac.uk/~kversp/ftp_html/node165.html
> This author claims that 
> 
> "... an adequate computational lexicon can 
>  only be established on the basis of 
>  top-down design derived from a linguistic 
>  theory in combination with bottom-up 
>  information derived from corpora about 
>  specific usage of language."
> 
> I don't find this a compelling argument,
> except that a simple parse (e.g. link parser)
> could help organize the samples into phrases.
> People don't learn a theory of language 
> and then fill it in.  They learn meanings
> and ways to combine atomic meanings into
> molecular sentences and compound paragraphs.
> There has been little success in NLP
> over the last five decades while following
> a syntax-first approach.  
> 
> It seems to me that the way people learn
> language, the steps we go through, should
> be emulated by a program to see if this
> approach could provide improvement in the
> results.  
> 
> But then the author goes on to describe
> how corpora could be used to deal with
> specific linguistic representation problems,
> and I like his suggestions in that vein.  
> 
> 
> 
>>I'm getting this from the article by
>>Ann Copestake (and somebody else? I
>>don't have the book handy) in Lexical
>>Semantics edited by Pustejovsky and
>>Briscoe (?))  
> 
> 
> 
> Googling "Ann Copestake", I found:
> http://www.cl.cam.ac.uk/~aac10/papers/gslt-slides.pdf
> On slide 13, she claims that Terry Windograd's
> SHRDLU program doesn't scale up well.  
> However, I think the NLP problem is really 
> one of engineering effective ways to accomplish
> this scaling.  SHRDLU's limited context has
> to be augmented with more complex contexts,
> and with a library of context frames that
> covers many common experiences.  
> 
> She does discuss question-answering systems
> in several parts of this paper, and that is
> the kind of thing I would like to experiment
> with.  Once a SHRDLU2 is developed that can
> handle a small number of contexts that relate
> to linguistics, and a library of common object
> simulations referenced in daily speech, I think
> the system can start to fill in its own tables
> by searching corpora, accepting corrections
> from observers, and otherwise acting like
> children do as they learn language.  
> 
> 
> 
>>I'm afraid I still don't
>>understand qualia in lexical semantics,
>>and haven't had time to study Corelex.
> 
> 
> 
> Qualia are experiences of sensation, so they
> represent world knowledge, right?  Lexical
> descriptions of qualia are dependent on the
> experiencing subject, and probably do vary
> more than most domains.  
> 
> 
> 
>>I think this approach can be tested
>>in small domains like the text of
>>*definitions* for small standard
>>vocabularies.  
> 
> 
> I agree.  One step up from SHRDLU would
> be a useful experiment.  If well instrumented,
> it could provide a basis for the next step.
> 
> 
> 
>>For example, I have
>>been playing around with the verbs
>>and nouns in the definitions of the
>>15 terms in simple Dublin Core. This
>>could be helpful in ensuring that
>>terms introduced in qualified DC
>>have appropriate relationships with
>>the terms they are refining.
> 
> 
> 
> Dublin Core is very small, so yes I think
> that might be a good one for you to work
> with.  I'm still required to keep my
> experiments work related, and I can't
> justify studying DC for that reason.  
> 
> I'm interested in simple financial domains,
> like in processing customer payments, and
> hosting English like rules that the operators
> develop as they experience the behavior 
> of checks, invoices, bank accounts, customers,
> business units, discounts, and so on.  So
> my experiments are in that kind of financial
> domain.  
> 
> 
> 
>>Fred
> 
> 
> Thanks for your email.  Hopefully we can
> help each other with our work.  
> 
> Rich
> 
>