Resolving Pronouns for a Resource-Poor Language, Malayalam Using Resource-Rich Language, Tamil.

Sobha Lalitha Devi


Abstract
In this paper we give in detail how a resource rich language can be used for resolving pronouns for a less resource language. The source language, which is resource rich language in this study, is Tamil and the resource poor language is Malayalam, both belonging to the same language family, Dravidian. The Pronominal resolution developed for Tamil uses CRFs. Our approach is to leverage the Tamil language model to test Malayalam data and the processing required for Malayalam data is detailed. The similarity at the syntactic level between the languages is exploited in identifying the features for developing the Tamil language model. The word form or the lexical item is not considered as a feature for training the CRFs. Evaluation on Malayalam Wikipedia data shows that our approach is correct and the results, though not as good as Tamil, but comparable.
Anthology ID:
R19-1072
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
611–618
Language:
URL:
https://aclanthology.org/R19-1072
DOI:
10.26615/978-954-452-056-4_072
Bibkey:
Cite (ACL):
Sobha Lalitha Devi. 2019. Resolving Pronouns for a Resource-Poor Language, Malayalam Using Resource-Rich Language, Tamil.. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 611–618, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Resolving Pronouns for a Resource-Poor Language, Malayalam Using Resource-Rich Language, Tamil. (Lalitha Devi, RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1072.pdf