Enhancing Scientific Discourse: Machine Translation for the Scientific Domain

Dimitris Roussis, Sokratis Sofianopoulos, Stelios Piperidis


Abstract
The increasing volume of scientific research necessitates effective communication across language barriers. Machine translation (MT) offers a promising solution for accessing international publications. However, the scientific domain presents unique challenges due to its specialized vocabulary and complex sentence structures. In this paper, we present the development of a collection of parallel and monolingual corpora from the scientific domain. The corpora target the language pairs Spanish-English, French-English, and Portuguese-English. For each language pair, we create a large general scientific corpus as well as four smaller corpora focused on the research domains of: Energy Research, Neuroscience, Cancer and Transportation. To evaluate the quality of these corpora, we utilize them for fine-tuning general-purpose neural machine translation (NMT) systems. We provide details regarding the corpus creation process, the fine-tuning strategies employed, and we conclude with the evaluation results.
Anthology ID:
2024.eamt-1.23
Volume:
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
Month:
June
Year:
2024
Address:
Sheffield, UK
Editors:
Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation (EAMT)
Note:
Pages:
275–285
Language:
URL:
https://aclanthology.org/2024.eamt-1.23
DOI:
Bibkey:
Cite (ACL):
Dimitris Roussis, Sokratis Sofianopoulos, and Stelios Piperidis. 2024. Enhancing Scientific Discourse: Machine Translation for the Scientific Domain. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 275–285, Sheffield, UK. European Association for Machine Translation (EAMT).
Cite (Informal):
Enhancing Scientific Discourse: Machine Translation for the Scientific Domain (Roussis et al., EAMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eamt-1.23.pdf