Measure and Evaluation of Semantic Divergence across Two Languages

Syrielle Montariol, Alexandre Allauzen


Abstract
Languages are dynamic systems: word usage may change over time, reflecting various societal factors. However, all languages do not evolve identically: the impact of an event, the influence of a trend or thinking, can differ between communities. In this paper, we propose to track these divergences by comparing the evolution of a word and its translation across two languages. We investigate several methods of building time-varying and bilingual word embeddings, using contextualised and non-contextualised embeddings. We propose a set of scenarios to characterize semantic divergence across two languages, along with a setup to differentiate them in a bilingual corpus. We evaluate the different methods by generating a corpus of synthetic semantic change across two languages, English and French, before applying them to newspaper corpora to detect bilingual semantic divergence and provide qualitative insight for the task. We conclude that BERT embeddings coupled with a clustering step lead to the best performance on synthetic corpora; however, the performance of CBOW embeddings is very competitive and more adapted to an exploratory analysis on a large corpus.
Anthology ID:
2021.acl-long.100
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1247–1258
Language:
URL:
https://aclanthology.org/2021.acl-long.100
DOI:
10.18653/v1/2021.acl-long.100
Bibkey:
Cite (ACL):
Syrielle Montariol and Alexandre Allauzen. 2021. Measure and Evaluation of Semantic Divergence across Two Languages. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1247–1258, Online. Association for Computational Linguistics.
Cite (Informal):
Measure and Evaluation of Semantic Divergence across Two Languages (Montariol & Allauzen, ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-long.100.pdf
Video:
 https://aclanthology.org/2021.acl-long.100.mp4