Constraint-Based Bilingual Lexicon Induction for Closely Related Languages

Arbi Haza Nasution, Yohei Murakami, Toru Ishida


Abstract
The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose a constraint-based bilingual lexicon induction for closely related languages by extending constraints and translation pair candidates from recent pivot language approach. We further define three constraint sets based on language characteristics. In this paper, two controlled experiments are conducted. The former involves four closely related language pairs with different language pair similarities, and the latter focuses on sense connectivity between non-pivot words and pivot words. We evaluate our result with F-measure. The result indicates that our method works better on voluminous input dictionaries and high similarity languages. Finally, we introduce a strategy to use proper constraint sets for different goals and language characteristics.
Anthology ID:
L16-1524
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3291–3298
Language:
URL:
https://aclanthology.org/L16-1524
DOI:
Bibkey:
Cite (ACL):
Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2016. Constraint-Based Bilingual Lexicon Induction for Closely Related Languages. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3291–3298, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Constraint-Based Bilingual Lexicon Induction for Closely Related Languages (Nasution et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1524.pdf