Automating Interlingual Homograph Recognition with Parallel Sentences

Yi Han, Ryohei Sasano, Koichi Takeda


Abstract
Interlingual homographs are words that spell the same but possess different meanings across languages. Recognizing interlingual homographs from form-identical words generally needs linguistic knowledge and massive annotation work. In this paper, we propose an automatic interlingual homograph recognition method based on the cross-lingual word embedding similarity and co-occurrence of form-identical words in parallel sentences. We conduct experiments with various off-the-shelf language models coordinating with cross-lingual alignment operations and co-occurrence metrics on the Chinese-Japanese and English-Dutch language pairs. Experimental results demonstrate that our proposed method is able to make accurate and consistent predictions across languages.
Anthology ID:
2022.findings-aacl.20
Volume:
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
Month:
November
Year:
2022
Address:
Online only
Editors:
Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
211–216
Language:
URL:
https://aclanthology.org/2022.findings-aacl.20
DOI:
Bibkey:
Cite (ACL):
Yi Han, Ryohei Sasano, and Koichi Takeda. 2022. Automating Interlingual Homograph Recognition with Parallel Sentences. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 211–216, Online only. Association for Computational Linguistics.
Cite (Informal):
Automating Interlingual Homograph Recognition with Parallel Sentences (Han et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-aacl.20.pdf