Proceedings of the First Workshop on NLP Tools and Resources for Translation and Interpreting Applications

Raquel Lázaro Gutiérrez, Antonio Pareja, Ruslan Mitkov (Editors)

Anthology ID:: 2023.nlp4tia-1
Month:: September
Year:: 2023
Address:: Varna, Bulgaria
Venues:: NLP4TIA | WS
Events:: First Workshop on NLP Tools and Resources for Translation and Interpreting Applications (2023) | International Conference Recent Advances in Natural Language Processing (2023) | Other Workshops and Events (2023)
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
URL:: https://aclanthology.org/2023.nlp4tia-1/
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://aclanthology.org/2023.nlp4tia-1.pdf

Proceedings of the First Workshop on NLP Tools and Resources for Translation and Interpreting Applications
Raquel Lázaro Gutiérrez | Antonio Pareja | Ruslan Mitkov

pdf bib

Natural Language Processing tools and resources for translation and interpreting applications. Introduction
Raquel Lazaro Gutierrez

pdf bib abs

Machine translation, translation errors, and adequacy: Spanish-English vs. Spanish-Romanian
Laura Monguilod | Bianca Vitalaru

This paper has two objectives: 1. To analyse the adequacy of using neural machine translation (NMT) for the translation of health information (from Spanish into English and Romanian) used in Spanish public health campaigns; and 2. To compare results considering these two linguistic combinations. Results show that post-editing is essential to improve the quality of the translations for both language combinations since they cannot be used as a primary resource for informing foreign users without post-editing. Moreover, Romanian translations require more post-editing. However, using NMT for informative texts combined with human post-editing can be used as a strategy to benefit from the potential of MT while at the same time ensuring the quality of the public service translations depending on the language combination and on the amount of time allotted for the task.

pdf bib abs

Cross-Lingual Idiom Sense Clustering in German and English
Mohammed Absar

Idioms are expressions with non-literal and non-compositional meanings. For this reason, they pose a unique challenge for various NLP tasks including Machine Translation and Sentiment Analysis. In this paper, we propose an approach to clustering idioms in different languages by their sense. We leverage pre-trained cross-lingual transformer models and fine-tune them to produce cross-lingual vector representations of idioms according to their sense.

pdf bib abs

Performance Evaluation on Human-Machine Teaming Augmented Machine Translation Enabled by GPT-4
Ming Qian

Translation has been modeled as a multiple-phase process where pre-editing analyses guide meaning transfer and interlingual restructure. Present-day machine translation (MT) tools provide no means for source text analyses. Generative AI with Large language modeling (LLM), equipped with prompt engineering and fine-tuning capabilities, can enable augmented MT solutions by explicitly including AI or human generated analyses/instruction, and/or human-generated reference translation as pre-editing or interactive inputs. Using an English-to-Chinese translation piece that had been carefully studied during a translator slam event, Four types of translation outputs on 20 text segments were evaluated: human-generated translation, Google Translate MT, instruction-augmented MT using GPT4-LLM, and Human-Machine-Teaming (HMT)-augmented translation based on both human reference translation and instruction using GPT4-LLM. While human translation had the best performance, both augmented MT approaches performed better than un-augmented MT. The HMT-augmented MT performed better than instruction-augmented MT because it combined the guidance and knowledge provided by both human reference translation and style instruction. However, since it is unrealistic to generate sentence-by-sentence human translation as MT input, better approaches to HMT-augmented MT need to be invented. The evaluation showed that generative AI with LLM can enable new MT workflow facilitating pre-editing analyses and interactive restructuring and achieving better performance.

pdf bib abs

The Interpretation System of African Languages in the Senegalese Parliament Debates
Jean Christophe Faye

The present work deals with the interpretation system of local languages in the Senegalese parliament. In other words, it is devoted to the implementation of the simultaneous interpretation system in the Senegalese Parliament debates. The Senegalese parliament, in cooperation with the European Parliament and the European Union, implemented, some years ago, a system of interpretation devoted to translating (into) six local languages. But what does the interpretation system consist in? What motivates the choice of six local languages and not more or less than six? Why does the Senegalese parliament implement such system in a country whose official language is French? What are the linguistic consequences of this interpretation system on the local and foreign languages spoken in the Senegalese parliament? How is the recruitment of interpreters done? To answer these questions, we have explored the documents and writings related to the implementation of the simultaneous interpretation system in the Senegalese parliament, in particular, and of the interpretation system, in general. Field surveys as well as interviews of some deputies, some interpreters and other people from the administration have also been organized and analyzed in this study. This research has helped us have a lot of information and collect data for the corpus. After the data collection, we have moved on to data analysis and we have ended up with results that we have presented in the body of the text.

pdf bib abs

Ngambay-French Neural Machine Translation (sba-Fr)
Toadoum Sari Sakayo | Angela Fan | Lema Logamou Seknewna

In Africa, and the world at large, there is an increasing focus on developing Neural Machine Translation (NMT) systems to overcome language barriers. NMT for Low-resource language is particularly compelling as it involves learning with limited labelled data. However, obtaining a well-aligned parallel corpus for low-resource languages can be challenging. The disparity between the technological advancement of a few global languages and the lack of research on NMT for local languages in Chad is striking. End-to-end NMT trials on low-resource Chad languages have not been attempted. Additionally, there is a dearth of online and well-structured data gathering for research in Natural Language Processing, unlike some African languages. However, a guided approach for data gathering can produce bitext data for many Chadian language translation pairs with well-known languages that have ample data. In this project, we created the first sba-Fr Dataset, which is a corpus of Ngambay-to-French translations, and fine-tuned three pre-trained models using this dataset. Our experiments show that the M2M100 model outperforms other models with high BLEU scores on both original and original+synthetic data. The publicly available bitext dataset can be used for research purposes.

pdf bib abs

Machine Translation of literary texts: genres, times and systems
Ana Isabel Cespedosa Vázquez | Ruslan Mitkov

Machine Translation (MT) has taken off dramatically in recent years due to the advent of Deep Learning methods and Neural Machine Translation (NMT) has enhanced the quality of automatic translation significantly. While most work has covered the automatic translation of technical, legal and medical texts, the application of MT to literary texts and the human role in this process have been underexplored. In an effort to bridge the gap of this under-researched area, this paper presents the results of a study which seeks to evaluate the performance of three MT systems applied to two different literary genres, two novels (1984 by George Orwell and Pride and Prejudice by Jane Austen) and two poems (I Felt a Funeral in my Brain by Emily Dickinson and Siren Song by Margaret Atwood) representing different literary periods and timelines. The evaluation was conducted by way of the automatic evaluation metric BLEU to objectively assess the performance that the MT system shows on each genre. The limitations of this study are also outlined.

pdf bib abs

sTMS Cloud – A Boutique Translation Project Management System
Nenad Angelov

Demonstration of a Cloud-based Translation Project Management System, called sTMS, de- veloped with the financial support of Opera- tional Programme “Innovation and Competi- tiveness” 2014 2020 (OPIC) focusing to en- hance the operational activities of LSPs and MLPs. The idea behind was to concentrate mainly on the management processes, and not to integrate CAT or MT tools, because we be- lieve that the more functional such systems be- come, the harder to technically support and easy to operate they become. The key features sTMS provides are developed as a result of the broad experience of Project Managers, the increased requirements of our customers, the digital capabilities of our vendors and as last to meet the constantly changing environment of the translation industry.

pdf bib abs

Leveraging Large Language Models to Extract Terminology
Julie Giguere

Large Language Models (LLMs) have brought us efficient tools for various natural language processing (NLP) tasks. This paper explores the application of LLMs for extracting domain-specific terms from textual data. We will present the advantages and limitations of using LLMs for this task and will highlight the significant improvements they offer over traditional terminology extraction methods such as rule-based and statistical approaches.

pdf bib abs

ChatGPT for translators: a survey
Constantin Orăsan

This article surveys the most important ways in which translators can use ChatGPT. The focus is on scenarios where ChatGPT supports the work of translators, rather than tries to replace them. A discussion of issues that translators need to consider when using large language models, and ChatGPT in particular, is also provided.