FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation Purposes

Dawid Wisniewski, Zofia Rostek, Artur Nowakowski


Abstract
People use language for various purposes. Apart from sharing information, individuals may use it to express emotions or to show respect for another person. In this paper, we focus on the formality level of machine-generated translations and present FAME-MT – a dataset consisting of 11.2 million translations between 15 European source languages and 8 European target languages classified to formal and informal classes according to target sentence formality. This dataset can be used to fine-tune machine translation models to ensure a given formality level for 8 European target languages considered. We describe the dataset creation procedure, the analysis of the dataset’s quality showing that FAME-MT is a reliable source of language register information, and we construct a publicly available proof-of-concept machine translation model that uses the dataset to steer the formality level of the translation. Currently, it is the largest dataset of formality annotations, with examples expressed in 112 European language pairs. The dataset is made available online.
Anthology ID:
2024.eamt-1.16
Volume:
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
Month:
June
Year:
2024
Address:
Sheffield, UK
Editors:
Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation (EAMT)
Note:
Pages:
164–180
Language:
URL:
https://aclanthology.org/2024.eamt-1.16
DOI:
Bibkey:
Cite (ACL):
Dawid Wisniewski, Zofia Rostek, and Artur Nowakowski. 2024. FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation Purposes. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 164–180, Sheffield, UK. European Association for Machine Translation (EAMT).
Cite (Informal):
FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation Purposes (Wisniewski et al., EAMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eamt-1.16.pdf