Zeno Vandenbulcke
2024
Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios?
Zeno Vandenbulcke
|
Lukas Vermeire
|
Miryam de Lhoneux
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)
POS tagging plays a fundamental role in numerous applications. While POS taggers are highly accurate in well-resourced settings, they lag behind in cases of limited or missing training data. This paper focuses on POS tagging for languages with limited data. We seek to identify favourable characteristics of datasets for training POS tagging models using related languages without specific training on the target language. This is a zero-shot approach. We investigate both mono- and multilingual models trained on related languages and compare their accuracies. Additionally, we compare these results with models trained directly on the target language itself. We do this for three target low-resource languages, for each of which we select several support languages. Our research highlights the importance of accurate dataset selection for developing effective zero-shot POS tagging models. Particularly, a strong linguistic relationship and high-quality datasets ensure optimal results. For extremely low-resource languages, zero-shot training proves to be a viable option.