PoS Tagging, Lemmatization and Dependency Parsing of West Frisian

Wilbert Heeringa, Gosse Bouma, Martha Hofman, Jelle Brouwer, Eduard Drenth, Jan Wijffels, Hans Van de Velde


Abstract
We present a lemmatizer/PoS tagger/dependency parser for West Frisian using a corpus of 44,714 words in 3,126 sentences that were annotated according to the guidelines of Universal Dependencies version 2. PoS tags were assigned to words by using a Dutch PoS tagger that was applied to a Dutch word-by-word translation, or to sentences of a Dutch parallel text. Best results were obtained when using word-by-word translations that were created by using the previous version of the Frisian translation program Oersetter. Morphologic and syntactic annotations were generated on the basis of a Dutch word-by-word translation as well. The performance of the lemmatizer/tagger/annotator when it was trained using default parameters was compared to the performance that was obtained when using the parameter values that were used for training the LassySmall UD 2.5 corpus. We study the effects of different hyperparameter settings on the accuracy of the annotation pipeline. The Frisian lemmatizer/PoS tagger/dependency parser is released as a web app and as a web service.
Anthology ID:
2022.lrec-1.512
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4789–4798
Language:
URL:
https://aclanthology.org/2022.lrec-1.512
DOI:
Bibkey:
Cite (ACL):
Wilbert Heeringa, Gosse Bouma, Martha Hofman, Jelle Brouwer, Eduard Drenth, Jan Wijffels, and Hans Van de Velde. 2022. PoS Tagging, Lemmatization and Dependency Parsing of West Frisian. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4789–4798, Marseille, France. European Language Resources Association.
Cite (Informal):
PoS Tagging, Lemmatization and Dependency Parsing of West Frisian (Heeringa et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.512.pdf