A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System

Vilhjálmur Þorsteinsson, Hulda Óladóttir, Hrafn Loftsson


Abstract
We present an open-source, wide-coverage context-free grammar (CFG) for Icelandic, and an accompanying parsing system. The grammar has over 5,600 nonterminals, 4,600 terminals and 19,000 productions in fully expanded form, with feature agreement constraints for case, gender, number and person. The parsing system consists of an enhanced Earley-based parser and a mechanism to select best-scoring parse trees from shared packed parse forests. Our parsing system is able to parse about 90% of all sentences in articles published on the main Icelandic news websites. Preliminary evaluation with evalb shows an F-measure of 70.72% on parsed sentences. Our system demonstrates that parsing a morphologically rich language using a wide-coverage CFG can be practical.
Anthology ID:
R19-1160
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1397–1404
Language:
URL:
https://aclanthology.org/R19-1160
DOI:
10.26615/978-954-452-056-4_160
Bibkey:
Cite (ACL):
Vilhjálmur Þorsteinsson, Hulda Óladóttir, and Hrafn Loftsson. 2019. A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 1397–1404, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System (Þorsteinsson et al., RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1160.pdf