Feeding NMT a Healthy Diet – The Impact of Quality, Quantity, or the Right Type of Nutrients

Abdallah Nasir, Sara Alisis, Ruba W Jaikat, Rebecca Jonsson, Sara Qardan, Eyas Shawahneh, Nour Al-Khdour


Abstract
In the era of gigantic language models, and in our case, Neural Machine Translation (NMT) models, where merely size seems to matter, we’ve been asking ourselves, is it healthy to just feed our NMT model with more and more data? In this presentation, we want to show our findings on the impact of NMT performance of different data “nutrients” we were feeding our models. We have explored the impact of quantity, quality and the type of data we feed to our English-Arabic NMT models. The presentation will show the impact of adding millions of parallel sentences into our training data as opposed to a much smaller data set with much higher quality, and the results from additional experiments with different data nutrients. We will highlight our learnings, challenges and share insights from our Linguistics Quality Assurance team, on what are the advantages and disadvantages of each type of data source and define the criteria of high-quality data with respect to a healthy NMT diet.
Anthology ID:
2022.amta-upg.21
Volume:
Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)
Month:
September
Year:
2022
Address:
Orlando, USA
Editors:
Janice Campbell, Stephen Larocca, Jay Marciano, Konstantin Savenkov, Alex Yanishevsky
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
293–312
Language:
URL:
https://aclanthology.org/2022.amta-upg.21
DOI:
Bibkey:
Cite (ACL):
Abdallah Nasir, Sara Alisis, Ruba W Jaikat, Rebecca Jonsson, Sara Qardan, Eyas Shawahneh, and Nour Al-Khdour. 2022. Feeding NMT a Healthy Diet – The Impact of Quality, Quantity, or the Right Type of Nutrients. In Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track), pages 293–312, Orlando, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Feeding NMT a Healthy Diet – The Impact of Quality, Quantity, or the Right Type of Nutrients (Nasir et al., AMTA 2022)
Copy Citation:
Presentation:
 2022.amta-upg.21.Presentation.pdf