Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource

Antonios Anastasopoulos, Marika Lekakou, Josep Quer, Eleni Zimianiti, Justin DeBenedetto, David Chiang


Abstract
Most work on part-of-speech (POS) tagging is focused on high resource languages, or examines low-resource and active learning settings through simulated studies. We evaluate POS tagging techniques on an actual endangered language, Griko. We present a resource that contains 114 narratives in Griko, along with sentence-level translations in Italian, and provides gold annotations for the test set. Based on a previously collected small corpus, we investigate several traditional methods, as well as methods that take advantage of monolingual data or project cross-lingual POS tags. We show that the combination of a semi-supervised method with cross-lingual transfer is more appropriate for this extremely challenging setting, with the best tagger achieving an accuracy of 72.9%. With an applied active learning scheme, which we use to collect sentence-level annotations over the test set, we achieve improvements of more than 21 percentage points.
Anthology ID:
C18-1214
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2529–2539
Language:
URL:
https://aclanthology.org/C18-1214
DOI:
Bibkey:
Cite (ACL):
Antonios Anastasopoulos, Marika Lekakou, Josep Quer, Eleni Zimianiti, Justin DeBenedetto, and David Chiang. 2018. Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2529–2539, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource (Anastasopoulos et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1214.pdf
Code
 antonis/grikoresource