A High Coverage Method for Automatic False Friends Detection for Spanish and Portuguese

Santiago Castro, Jairo Bonanata, Aiala Rosá


Abstract
False friends are words in two languages that look or sound similar, but have different meanings. They are a common source of confusion among language learners. Methods to detect them automatically do exist, however they make use of large aligned bilingual corpora, which are hard to find and expensive to build, or encounter problems dealing with infrequent words. In this work we propose a high coverage method that uses word vector representations to build a false friends classifier for any pair of languages, which we apply to the particular case of Spanish and Portuguese. The required resources are a large corpus for each language and a small bilingual lexicon for the pair.
Anthology ID:
W18-3903
Volume:
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
29–36
Language:
URL:
https://aclanthology.org/W18-3903
DOI:
Bibkey:
Cite (ACL):
Santiago Castro, Jairo Bonanata, and Aiala Rosá. 2018. A High Coverage Method for Automatic False Friends Detection for Spanish and Portuguese. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 29–36, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
A High Coverage Method for Automatic False Friends Detection for Spanish and Portuguese (Castro et al., VarDial 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3903.pdf
Presentation:
 W18-3903.Presentation.pdf
Code
 pln-fing-udelar/false-friends