Lydia Hirschberg


Punctuation and automatic syntactic analysis
Proceedings of the Annual meeting of the Association for Machine Translation and Computational Linguistics

In this paper we discuss how algorithms for automatic analysis can take advantage of information carried by the punctuation marks. We neglect stylistic aspects of punctuation because they lack universality of usage and we restrict ourselves to those rules which any punctuation must observe in order to be intelligible. This involves a concept we call “coherence” of punctuation. In order to define “coherence”, we introduce two characteristics, which we prove to be mutually independent, namely “separating power” and “syntactic function”. The separating power is defined by three experimental laws expressing the fact that two punctuation marks of different separating power prevent to a different extent syntactic links from crossing them. These laws are defined independently of any particular grammatical character of the punctuation marks or of the attached grammatical syntagms. On the other hand, whichever grammatical system we choose, we may assimilate the punctuation marks to the ordinary words, to the extent that we can assign to them a known grammatical character and function, well defined in any particular context. They differ however from the other words by their large number of homographs and synonyms i.e. by the fact that almost every punctuation mark can occur with almost every grammatical value in each particular case, and in quite similar contexts. The syntactic functions, in general, and in particular those of the punctuation marks, can be ordered according to an arbitrary scale of decreasing “value” of syntactic links, where the “value” of a link is directly related to the number of syntactic conditions the links must satisfy. The law of coherence, then, shows that in a given context, a particular punctuation mark cannot indistinctly represent all its homographs, so that a certain number of assumptions about its syntactic nature and function can be discarded. This law can be stated as follows: “When moving from a punctuation mark to its immediate (left or right) neighbor in any text, the separating power cannot increase if the value of the syntactic function increases and vice-versa”. In addition we review two related topics, namely the stylistic character of punctuation and the necessity and existence of intrinsic criteria of grammatically, i.e. independent of punctuation. We propose such a criterion, and suggest a formalism related to the parenthesis free notation of logic.