Automated detection and annotation of term definitions in German text corpora
Angelika Storrer | Sandra Wellinghoff
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We describe an approach to automatically detect and annotate definitions for technical terms in German text corpora. This approach focuses on verbs that typically appear in definitions (= definitor verbs). We specify search patterns based on the valency frames of these definitor verbs and use them (1) to detect and delimit text segments containing definitions and (2) to annotate their main functional components: the definiendum (the term that is defined) and the definiens (meaning postulates for this term). On the basis of these annotations we aim at automatically extracting WordNet-style semantic relations that hold between the head nouns of the definiendum and the head nouns of the definiens. In this paper, we will describe our annotation scheme for definitions and report on two studies: (1) a pilot study that evaluates our definition extraction approach using a German corpus with manually annotated definitions as a gold standard. (2) A feasibility study that evaluates the possibility to extract hypernym, hyponym and holonym relations from these annotated definitions.


Exploiting Coreference Annotations for Text-to-Hypertext Conversion
Anke Holler | Jan Frederik Maas | Angelika Storrer
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The paper describes an annotation scheme for coreference developed within the application context of text-to-hypertext conversion. In this context coference is used (1) for generating document-internal and cross-document hyperlinks, and (2) for resolving anaphoric expressions in order to achieve cohesive closedness in hypertext nodes. We will argue that for the purpose of cross-document linking it is necessary to separate the annotation of coreference relations from the annotation of anaphoric relations. To account for this requirement, we developed a knowledge-based annotation scheme that relates referential expressions in the text to entities in a knowledge representation, which is modeled using XML Topic Maps.


Converting a Corpus into a Hypertext: An Approach Using XML Topic Maps and XSLT
Eva Anna Lenz | Angelika Storrer
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)


Description and acquisition of multiword lexemes
Angelika Storrer | Ulrike Schwall
Third International EAMT Workshop: Machine Translation and the Lexicon

This paper deals with multiword lexemes (MWLs), focussing on two types of verbal MWLs: verbal idioms and support verb constructions. We discuss the characteristic properties of MWLs, namely non-standard compositionality, restricted substitutability of components, and restricted morpho-syntactic flexibility, and we show how these properties may cause serious problems during the analysis, generation, and transfer steps of machine translation systems. In order to cope with these problems, MT lexicons need to provide detailed descriptions of MWL properties. We list the types of information which we consider the necessary minimum for a successful processing of MWLs, and report on some feasibility studies aimed at the automatic extraction of German verbal multiword lexemes from text corpora and machine-readable dictionaries.


A Reusable Lexical Database Tool for Machine Translation
Brigitte Blaser | Ulrike Schwall | Angelika Storrer
COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics