The Interplay Between Lexical and Syntactic Resources in Incremental Parsebanking

Victoria Rosén, Petter Haugereid, Martha Thunes, Gyri S. Losnegaard, Helge Dyvik


Abstract
Automatic syntactic analysis of a corpus requires detailed lexical and morphological information that cannot always be harvested from traditional dictionaries. In building the INESS Norwegian treebank, it is often the case that necessary lexical information is missing in the morphology or lexicon. The approach used to build the treebank is incremental parsebanking; a corpus is parsed with an existing grammar, and the analyses are efficiently disambiguated by annotators. When the intended analysis is unavailable after parsing, the reason is often that necessary information is not available in the lexicon. INESS has therefore implemented a text preprocessing interface where annotators can enter unrecognized words before parsing. This may concern words that are unknown to the morphology and/or lexicon, and also words that are known, but for which important information is missing. When this information is added, either during text preprocessing or during disambiguation, the result is that after reparsing the intended analysis can be chosen and stored in the treebank. The lexical information added to the lexicon in this way may be of great interest both to lexicographers and to other language technology efforts, and the enriched lexical resource being developed will be made available at the end of the project.
Anthology ID:
L14-1064
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1617–1624
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1085_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Victoria Rosén, Petter Haugereid, Martha Thunes, Gyri S. Losnegaard, and Helge Dyvik. 2014. The Interplay Between Lexical and Syntactic Resources in Incremental Parsebanking. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1617–1624, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
The Interplay Between Lexical and Syntactic Resources in Incremental Parsebanking (Rosén et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1085_Paper.pdf