On Machine Translation of User Reviews

Maja Popović, Alberto Poncelas, Marija Brkic, Andy Way


Abstract
This work investigates neural machine translation (NMT) systems for translating English user reviews into Croatian and Serbian, two similar morphologically complex languages. Two types of reviews are used for testing the systems: IMDb movie reviews and Amazon product reviews. Two types of training data are explored: large out-of-domain bilingual parallel corpora, as well as small synthetic in-domain parallel corpus obtained by machine translation of monolingual English Amazon reviews into the target languages. Both automatic scores and human evaluation show that using the synthetic in-domain corpus together with a selected sub-set of out-of-domain data is the best option. Separated results on IMDb and Amazon reviews indicate that MT systems perform differently on different review types so that user reviews generally should not be considered as a homogeneous genre. Nevertheless, more detailed research on larger amount of different reviews covering different domains/topics is needed to fully understand these differences.
Anthology ID:
2021.ranlp-1.124
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1109–1118
Language:
URL:
https://aclanthology.org/2021.ranlp-1.124
DOI:
Bibkey:
Cite (ACL):
Maja Popović, Alberto Poncelas, Marija Brkic, and Andy Way. 2021. On Machine Translation of User Reviews. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1109–1118, Held Online. INCOMA Ltd..
Cite (Informal):
On Machine Translation of User Reviews (Popović et al., RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.124.pdf
Data
IMDb Movie Reviews