A Case Study on the Importance of Named Entities in a Machine Translation Pipeline for Customer Support Content

Miguel Menezes, Vera Cabarrão, Pedro Mota, Helena Moniz, Alon Lavie


Abstract
This paper describes the research developed at Unbabel, a Portuguese Machine-translation start-up, that combines MT with human post-edition and focuses strictly on customer service content. We aim to contribute to furthering MT quality and good-practices by exposing the importance of having a continuously-in-development robust Named Entity Recognition system compliant with General Data Protection Regulation (GDPR). Moreover, we have tested semiautomatic strategies that support and enhance the creation of Named Entities gold standards to allow a more seamless implementation of Multilingual Named Entities Recognition Systems. The project described in this paper is the result of a shared work between Unbabel ́s linguists and Unbabel ́s AI engineering team, matured over a year. The project should, also, be taken as a statement of multidisciplinary, proving and validating the much-needed articulation between the different scientific fields that compose and characterize the area of Natural Language Processing (NLP).
Anthology ID:
2022.eamt-1.24
Volume:
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
Month:
June
Year:
2022
Address:
Ghent, Belgium
Editors:
Helena Moniz, Lieve Macken, Andrew Rufener, Loïc Barrault, Marta R. Costa-jussà, Christophe Declercq, Maarit Koponen, Ellie Kemp, Spyridon Pilos, Mikel L. Forcada, Carolina Scarton, Joachim Van den Bogaert, Joke Daems, Arda Tezcan, Bram Vanroy, Margot Fonteyne
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
211–219
Language:
URL:
https://aclanthology.org/2022.eamt-1.24
DOI:
Bibkey:
Cite (ACL):
Miguel Menezes, Vera Cabarrão, Pedro Mota, Helena Moniz, and Alon Lavie. 2022. A Case Study on the Importance of Named Entities in a Machine Translation Pipeline for Customer Support Content. In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 211–219, Ghent, Belgium. European Association for Machine Translation.
Cite (Informal):
A Case Study on the Importance of Named Entities in a Machine Translation Pipeline for Customer Support Content (Menezes et al., EAMT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.eamt-1.24.pdf