Analysing cross-lingual transfer in lemmatisation for Indian languages

Kumar Saurav, Kumar Saunack, Pushpak Bhattacharyya


Abstract
Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form. However, most of the prior work on this topic has focused on high resource languages. In this paper, we evaluate cross-lingual approaches for low resource languages, especially in the context of morphologically rich Indian languages. We test our model on six languages from two different families and develop linguistic insights into each model’s performance.
Anthology ID:
2020.coling-main.534
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
6070–6076
Language:
URL:
https://aclanthology.org/2020.coling-main.534
DOI:
10.18653/v1/2020.coling-main.534
Bibkey:
Cite (ACL):
Kumar Saurav, Kumar Saunack, and Pushpak Bhattacharyya. 2020. Analysing cross-lingual transfer in lemmatisation for Indian languages. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6070–6076, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Analysing cross-lingual transfer in lemmatisation for Indian languages (Saurav et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.534.pdf