Parallel Sentence Extraction from Comparable Corpora with Neural Network Features

Chenhui Chu, Raj Dabre, Sadao Kurohashi


Abstract
Parallel corpora are crucial for machine translation (MT), however they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract parallel sentences from them for MT. In this paper, we exploit the neural network features acquired from neural MT for parallel sentence extraction. We observe significant improvements for both accuracy in sentence extraction and MT performance.
Anthology ID:
L16-1468
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2931–2935
Language:
URL:
https://aclanthology.org/L16-1468/
DOI:
Bibkey:
Cite (ACL):
Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2016. Parallel Sentence Extraction from Comparable Corpora with Neural Network Features. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2931–2935, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Parallel Sentence Extraction from Comparable Corpora with Neural Network Features (Chu et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1468.pdf