Dimitris Roussis
2024
Enhancing Scientific Discourse: Machine Translation for the Scientific Domain
Dimitris Roussis
|
Sokratis Sofianopoulos
|
Stelios Piperidis
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
The increasing volume of scientific research necessitates effective communication across language barriers. Machine translation (MT) offers a promising solution for accessing international publications. However, the scientific domain presents unique challenges due to its specialized vocabulary and complex sentence structures. In this paper, we present the development of a collection of parallel and monolingual corpora from the scientific domain. The corpora target the language pairs Spanish-English, French-English, and Portuguese-English. For each language pair, we create a large general scientific corpus as well as four smaller corpora focused on the research domains of: Energy Research, Neuroscience, Cancer and Transportation. To evaluate the quality of these corpora, we utilize them for fine-tuning general-purpose neural machine translation (NMT) systems. We provide details regarding the corpus creation process, the fine-tuning strategies employed, and we conclude with the evaluation results.