A Lexical Resource for the Identification of “Weak Words” in German Specification Documents

Jennifer Krisch, Melanie Dick, Ronny Jauch, Ulrich Heid


Abstract
We report on the creation of a lexical resource for the identification of potentially unspecific or imprecise constructions in German requirements documentation from the car manufacturing industry. In requirements engineering, such expressions are called “weak words”: they are not sufficiently precise to ensure an unambiguous interpretation by the contractual partners, who for the definition of their cooperation, typically rely on specification documents (Melchisedech, 2000); an example are dimension adjectives, such as kurz or lang (‘short’, ‘long’) which need to be modified by adverbials indicating the exact duration, size etc. Contrary to standard practice in requirements engineering, where the identification of such weak words is merely based on stopword lists, we identify weak uses in context, by querying annotated text. The queries are part of the resource, as they define the conditions when a word use is weak. We evaluate the recognition of weak uses on our development corpus and on an unseen evaluation corpus, reaching stable F1-scores above 0.95.
Anthology ID:
L16-1454
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2846–2850
Language:
URL:
https://aclanthology.org/L16-1454
DOI:
Bibkey:
Cite (ACL):
Jennifer Krisch, Melanie Dick, Ronny Jauch, and Ulrich Heid. 2016. A Lexical Resource for the Identification of “Weak Words” in German Specification Documents. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2846–2850, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Lexical Resource for the Identification of “Weak Words” in German Specification Documents (Krisch et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1454.pdf