Development and evaluation of phonological models for cognate identication

Bogdan Babych

Development and evaluation of phonological models for cognate identication

Abstract

The paper presents a methodology for the development and task-based evaluation of phonological models, which improve the accuracy of cognate terminology identification, but may potentially be used for other applications, such as transliteration or improving character-based NMT. Terminology translation remains a bottleneck for MT, especially for under-resourced languages and domains, and automated identification of cognate terms addresses this problem. The proposed phonological models explicitly represent distinctive phonological features for each character, such as acoustic types (e.g., vowel/ consonant, voiced/ unvoiced/ sonant), place and manner of articulation (closed/open, front/back vowel; plosive, fricative, or labial, dental, glottal consonant). The advantage of such representations is that they explicate information about characters’ internal structure rather than treat them as elementary atomic units of comparison, placing graphemes into a feature space that provides additional information about their articulatory (pronunciation-based) or acoustic (soundbased) distances and similarity. The article presents experimental results of using the proposed phonological models for extracting cognate terminology with the phonologically aware Levenshtein edit distance, which for Top-1 cognate ranking metric outperforms the baseline character-based Levenshtein by 16.5%. Project resources are released on: https://github.com/bogdanbabych/cognates-phonology

Anthology ID:: 2018.eamt-main.4
Volume:: Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Month:: May
Year:: 2018
Address:: Alicante, Spain
Editors:: Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Miquel Esplà-Gomis, Maja Popović, Celia Rico, André Martins, Joachim Van den Bogaert, Mikel L. Forcada
Venue:: EAMT
SIG:
Publisher:
Note:
Pages:: 61–68
Language:
URL:: https://aclanthology.org/2018.eamt-main.4/
DOI:
Bibkey:
Cite (ACL):: Bogdan Babych. 2018. Development and evaluation of phonological models for cognate identication. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, pages 61–68, Alicante, Spain.
Cite (Informal):: Development and evaluation of phonological models for cognate identication (Babych, EAMT 2018)
Copy Citation:
PDF:: https://aclanthology.org/2018.eamt-main.4.pdf

PDF Cite Search Fix data