Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities

Meritxell Fernández Barrera, Vladimir Popescu, Antonio Toral, Federico Gaspari, Khalid Choukri


Abstract
This paper discusses the role that statistical machine translation (SMT) can play in the development of cross-border EU e-commerce,by highlighting extant obstacles and identifying relevant technologies to overcome them. In this sense, it firstly proposes a typology of e-commerce static and dynamic textual genres and it identifies those that may be more successfully targeted by SMT. The specific challenges concerning the automatic translation of user-generated content are discussed in detail. Secondly, the paper highlights the risk of data sparsity inherent to e-commerce and it explores the state-of-the-art strategies to achieve domain adequacy via adaptation. Thirdly, it proposes a robust workflow for the development of SMT systems adapted to the e-commerce domain by relying on inexpensive methods. Given the scarcity of user-generated language corpora for most language pairs, the paper proposes to obtain monolingual target-language data to train language models and aligned parallel corpora to tune and evaluate MT systems by means of crowdsourcing.
Anthology ID:
L16-1721
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4550–4556
Language:
URL:
https://aclanthology.org/L16-1721
DOI:
Bibkey:
Cite (ACL):
Meritxell Fernández Barrera, Vladimir Popescu, Antonio Toral, Federico Gaspari, and Khalid Choukri. 2016. Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4550–4556, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities (Barrera et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1721.pdf