Cognates and Word Alignment in Bitexts

Grzegorz Kondrak


Abstract
We evaluate several orthographic word similarity measures in the context of bitext word alignment. We investigate the relationship between the length of the words and the length of their longest common subsequence. We present an alternative to the longest common subsequence ratio (LCSR), a widely-used orthographic word similarity measure. Experiments involving identification of cognates in bitexts suggest that the alternative method outperforms LCSR. Our results also indicate that alignment links can be used as a substitute for cognates for the purpose of evaluating word similarity measures.
Anthology ID:
2005.mtsummit-papers.40
Volume:
Proceedings of Machine Translation Summit X: Papers
Month:
September 13-15
Year:
2005
Address:
Phuket, Thailand
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
305–312
Language:
URL:
https://aclanthology.org/2005.mtsummit-papers.40
DOI:
Bibkey:
Cite (ACL):
Grzegorz Kondrak. 2005. Cognates and Word Alignment in Bitexts. In Proceedings of Machine Translation Summit X: Papers, pages 305–312, Phuket, Thailand.
Cite (Informal):
Cognates and Word Alignment in Bitexts (Kondrak, MTSummit 2005)
Copy Citation:
PDF:
https://aclanthology.org/2005.mtsummit-papers.40.pdf