Alignment Analysis of Sequential Segmentation of Lexicons to Improve Automatic Cognate Detection

Pranav A


Abstract
Ranking functions in information retrieval are often used in search engines to extract the relevant answers to the query. This paper makes use of this notion of information retrieval and applies onto the problem domain of cognate detection. The main contributions of this paper are: (1) positional tokenization, which incorporates the sequential notion; (2) graphical error modelling, which calculates the morphological shifts. The current research work only distinguishes whether a pair of words are cognates or not. However, we also study if we could predict a possible cognate from the given input. Our study shows that language modelling based retrieval functions with positional tokenization and error modelling tend to give better results than competing baselines.
Anthology ID:
P18-3019
Volume:
Proceedings of ACL 2018, Student Research Workshop
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Vered Shwartz, Jeniya Tabassum, Rob Voigt, Wanxiang Che, Marie-Catherine de Marneffe, Malvina Nissim
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
134–140
Language:
URL:
https://aclanthology.org/P18-3019
DOI:
10.18653/v1/P18-3019
Bibkey:
Cite (ACL):
Pranav A. 2018. Alignment Analysis of Sequential Segmentation of Lexicons to Improve Automatic Cognate Detection. In Proceedings of ACL 2018, Student Research Workshop, pages 134–140, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Alignment Analysis of Sequential Segmentation of Lexicons to Improve Automatic Cognate Detection (A, ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-3019.pdf
Code
 pranav-ust/cognates