Identifying Semantic Divergences in Parallel Text without Annotations

Yogarshi Vyas, Xing Niu, Marine Carpuat


Abstract
Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.
Anthology ID:
N18-1136
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1503–1515
Language:
URL:
https://aclanthology.org/N18-1136
DOI:
10.18653/v1/N18-1136
Bibkey:
Cite (ACL):
Yogarshi Vyas, Xing Niu, and Marine Carpuat. 2018. Identifying Semantic Divergences in Parallel Text without Annotations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1503–1515, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Identifying Semantic Divergences in Parallel Text without Annotations (Vyas et al., NAACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/N18-1136.pdf
Dataset:
 N18-1136.Datasets.tgz
Video:
 https://aclanthology.org/N18-1136.mp4
Code
 yogarshi/SemDiverge
Data
OpenSubtitles