Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

SUO: Re: CG: Question about CGs and OO concepts




Unmesh,

That is a good example of an application that would benefit from
a CG specification.  As a matter of fact, a related application
was actually implemented in CGs in a very short time (once the
appropriate CG software was available).  For a description of the
approach, see Section 5 of the following paper:

    http://www.jfsowa.com/pubs/tosi.htm

This paper describes some CG software implemented by a startup
company, VivoMind LLC.  The project described in Section 5 was an
example of "legacy re-engineering", which involved an analysis of
100 megabytes of English documentaion and 1.5 million lines of code
in both COBOL and JCL.

Two colleagues of mine, Arun Majumdar and Andre Leclerc, used the same
parser with different grammars to analyze all three languages, and
they translated the results to CGs as the common intermediate language.
As a result, they were able to detect discrepancies between the 
documentation and the implementation.  From the CGs, they generated
an English glossary, a data dictionary, and specifications for
generating ER diagrams and system flow diagrams.

 > I am working in a project which involves automatic evaluations of some
 > documents. For this purpose we have created a controlled language. The
 > documents are written in this language. and the processing of certain
 > requests is done automatically reading these documents at runtime.

I don't know how ambitious your project is, since the word "evaluation"
could mean many different things.  The legacy re-engineering project
was narrowly focused, since it didn't attempt to "evaluate" the
documentation, but merely to search for and interpret those sentences
that mentioned something about the data, files, and procedures used by
the system.  It ignored all sentences that did not address those topics.

 > All the business logic is implemented in the form of BNF for the
 > documents. And all the processing logic is documented in some other
 > language. For last two years we were actually searching for the best
 > formalism to specify the semantics of the business documents. Because
 > of lack of formal specifications, we are heavily dependent on the BNF
 > for the language and many times need to ask the BNF designers the
 > exact semantics of the BNF construct.

For the project by Majumdar and Leclerc, the three grammars (for
English, COBOL, and JCL) were ignored after the CGs were generated.
The CGs generated from any of the three languages all looked the
same.  Since the English input was syntactically and semantically
unrestricted, it was quite possible that the system might have
misinterpreted some of the inputs.  To detect and prevent such
possibilities, the system used a "measure of evidence" to estimate
its confidence that the interpretation was correct.  If the measure
was below a certain threshold, the system would alert Majumdar or
Leclerc and ask them to check the CGs with the original English input.

 > I thought conceptual graphs can be a better alternative. UML etc are
 > out of question.  The BNF is huge, around 70 A4 size pages with 1400 
 > > productions.  But I am scared what will be the size of conceptual
 > graph specifications. Also if it is right choice to make? But I think
 > there is no other formal notation available right now which can
 > express Business Process semantics with the Semantic Nets with the
 > Langauge semantics of the documents.

The size of the CG specification is independent of the size of the
grammar (or even the nature of the grammar, since it can be applied
equally well to natural languages or computer languages).  It only
depends on the size of the semantics.

For the legacy re-engineering project, the size of the semantics
was actually quite small, even though the input English was
unrestricted.  For COBOL, it only analyzed the data division
and the environment division of the COBOL programs, and it ignored
the procedure division, which contained the most complex code.
For JCL, it analyzed the DD cards to determine what files were
being referenced, but it ignored most of the details that were
not relevant to the task.

That is why the title of the paper is "Task-Oriented Semantic
Interpretation" -- the system merely ignores input that is
unrelated to the semantics of the task.  However, it is important
to note that the parser can detect the relevant semantics in
many different syntactic variations.  It completely decouples
the syntax from the semantics and makes it possible to apply the
same semantic interpreter to very different languages.

For the legacy re-engineering proejct, a large consulting company
estimated that it would take two years with 40 people to analyze
all the documentation and code and to generate the output in the
desired formats (an English glossary, a data dictionary, and the
input required for E-R diagrams and dataflow charts).  But Majumdar
and Leclerc finished the whole project in 8 weeks:

  - One week for the preliminary discussions and determination of
    what was had to be done.

  - Three weeks to customize the software -- translate the BNF for
    COBOL and JCL to Prolog definite-clause grammars, define the
    canonical conceptual graphs for the semantics, and write programs
    to access the data, which was located on 300 different computers
    scattered across North America.

  - Three weeks to run the programs (24 hours a day 7 days a week
    on a 750 MHz Pentium III).  During that time, the computer used
    the measure of evidence to determine whether the input sentences
    were (1) irrelevant, (2) relevant and correctly interpreted,
    (3) possibly relevant, but below its threshold of confidence.
    The great majority of the sentences were of type 1, most of
    the rest were of type 2, and only the tiny fraction of type 3
    sentences required human assistance.

  - A couple more days to format the results and put them on a CD-ROM.

The entire project was finished in 16 person weeks instead of
80 person years.

John Sowa