Rachel Grewcock
2023
Context-aware and gender-neutral Translation Memories
Marjolene Paulo
|
Vera Cabarrão
|
Helena Moniz
|
Miguel Menezes
|
Rachel Grewcock
|
Eduardo Farah
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
This work proposes an approach to use Part-Of-Speech (POS) information to automatically detect context-dependent Translation Units (TUs) from a Translation Memory database pertaining to the customer support domain. In line with our goal to minimize context-dependency in TUs, we show how this mechanism can be deployed to create new gender-neutral and context-independent TUs. Our experiments, conducted across Portuguese (PT), Brazilian Portuguese (PT-BR), Spanish (ES), and Spanish-Latam (ES-LATAM), show that the occurrence of certain POS with specific words is accurate in identifying context dependency. In a cross-client analysis, we found that ~10% of the most frequent 13,200 TUs were context-dependent, with gender determining context-dependency in 98% of all confirmed cases. We used these findings to suggest gender-neutral equivalents for the most frequent TUs with gender constraints. Our approach is in use in the Unbabel translation pipeline, and can be integrated into any other Neural Machine Translation (NMT) pipeline.