SUO: HTML Tidy and Assertion Grammars
I came across some useful tools, among which Tidy is what I
have always wanted fo cleaning up the HTML that is generated
by word processors. Following is an article that summarizes
what it does:
http://unixreview.com/articles/2001/0109/0109e/0109e.htm
And following is the official web page for tidy:
http://www.w3.org/People/Raggett/tidy/
Free versions of Tidy are available for all major platforms,
such as Linux, Mac, and Unix, as well as legacy systems, such
as Amiga, Atari, and Windows.
The discussion of Tidy has a pointer to another interesting page
that describes Assertion Grammars, which are more flexible than
XML DTDs, and they can be used to process XML files in various ways,
including generating DTDs:
http://www.w3.org/People/Raggett/dtdgen/Docs/
Following is an excerpt from that web page. Note the comment
"a means for documents to be described in terms of an algebra
operating over modules, which in turn are described as collections
of assertions."
The idea of defining a grammar or algebra for operating on modules
is something that could be applied to many source files, including
CGs, KIF, or any system of ontology. It could be used for metalevel
processing of anything. The idea is worth exploring further.
John Sowa
______________________________________________________________________
This document describes experimental work in progress at HP Labs,
Bristol, on formal techniques for describing combinations of modular
tagsets for documents written in XML. The motivation is provided by
the increasing diversity of web browsers, running on desktops,
television, handhelds, cellphones or voice browsers.
The goal is to provide a means for documents to be described in
terms of an algebra operating over modules, which in turn are
described as collections of assertions. It is hoped that this work
will provide an interesting comparison with traditional approaches
based upon Document Type Declarations, and more recent approaches,
such as the drafts published by the W3C XML Schemas working group.