Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank

Eleftheria Briakou; Marine Carpuat

doi:10.18653/v1/2020.emnlp-main.121

Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank

Abstract

Detecting fine-grained differences in content conveyed in different languages matters for cross-lingual NLP and multilingual corpora analysis, but it is a challenging machine learning problem since annotation is expensive and hard to scale. This work improves the prediction and annotation of fine-grained semantic divergences. We introduce a training strategy for multilingual BERT models by learning to rank synthetic divergent examples of varying granularity. We evaluate our models on the Rationalized English-French Semantic Divergences, a new dataset released with this work, consisting of English-French sentence-pairs annotated with semantic divergence classes and token-level rationales. Learning to rank helps detect fine-grained sentence-level divergences more accurately than a strong sentence-level similarity model, while token-level predictions have the potential of further distinguishing between coarse and fine-grained divergences.

Anthology ID:: 2020.emnlp-main.121
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:: November
Year:: 2020
Address:: Online
Editors:: Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1563–1580
Language:
URL:: https://aclanthology.org/2020.emnlp-main.121/
DOI:: 10.18653/v1/2020.emnlp-main.121
Bibkey:
Cite (ACL):: Eleftheria Briakou and Marine Carpuat. 2020. Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1563–1580, Online. Association for Computational Linguistics.
Cite (Informal):: Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank (Briakou & Carpuat, EMNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.emnlp-main.121.pdf
Video:: https://slideslive.com/38939142
Code: Elbria/xling-SemDiv
Data: REFreSD, WikiMatrix

PDF Cite Search Code Video Fix data