Wafaa Mohammed
2024
On Measuring Context Utilization in Document-Level MT Systems
Wafaa Mohammed
|
Vlad Niculae
Findings of the Association for Computational Linguistics: EACL 2024
Document-level translation models are usually evaluated using general metrics such as BLEU, which are not informative about the benefits of context. Current work on context-aware evaluation, such as contrastive methods, only measure translation accuracy on words that need context for disambiguation. Such measures cannot reveal whether the translation model uses the correct supporting context. We propose to complement accuracy-based evaluation with measures of context utilization. We find that perturbation-based analysis (comparing models’ performance when provided with correct versus random context) is an effective measure of overall context utilization. For a finer-grained phenomenon-specific evaluation, we propose to measure how much the supporting context contributes to handling context-dependent discourse phenomena. We show that automatically-annotated supporting context gives similar conclusions to human-annotated context and can be used as alternative for cases where human annotations are not available. Finally, we highlight the importance of using discourse-rich datasets when assessing context utilization.
Findings of the WMT 2024 Shared Task on Chat Translation
Wafaa Mohammed
|
Sweta Agrawal
|
Amin Farajian
|
Vera Cabarrão
|
Bryan Eikema
|
Ana C Farinha
|
José G. C. De Souza
Proceedings of the Ninth Conference on Machine Translation
This paper presents the findings from the third edition of the Chat Translation Shared Task. As with previous editions, the task involved translating bilingual customer support conversations, specifically focusing on the impact of conversation context in translation quality and evaluation. We also include two new language pairs: English-Korean and English-Dutch, in addition to the set of language pairs from previous editions: English-German, English-French, and English-Brazilian Portuguese.We received 22 primary submissions and 32 contrastive submissions from eight teams, with each language pair having participation from at least three teams. We evaluated the systems comprehensively using both automatic metrics and human judgments via a direct assessment framework.The official rankings for each language pair were determined based on human evaluation scores, considering performance in both translation directions—agent and customer. Our analysis shows that while the systems excelled at translating individual turns, there is room for improvement in overall conversation-level translation quality.
2022
Visual Grounding of Inter-lingual Word-Embeddings
Wafaa Mohammed
|
Hassan Shahmohammadi
|
Hendrik P. A. Lensch
|
R. Harald Baayen
Proceedings of the Workshop on Unimodal and Multimodal Induction of Linguistic Structures (UM-IoS)
Visual grounding of Language aims at enriching textual representations of language with multiple sources of visual knowledge such as images and videos. Although visual grounding is an area of intense research, inter-lingual aspects of visual grounding have not received much attention. The present study investigates the inter-lingual visual grounding of word embeddings. We propose an implicit alignment technique between the two spaces of vision and language in which inter-lingual textual information interacts in order to enrich pre-trained textual word embeddings. We focus on three languages in our experiments, namely, English, Arabic, and German. We obtained visually grounded vector representations for these languages and studied whether visual grounding on one or multiple languages improved the performance of embeddings on word similarity and categorization benchmarks. Our experiments suggest that inter-lingual knowledge improves the performance of grounded embeddings in similar languages such as German and English. However, inter-lingual grounding of German or English with Arabic led to a slight degradation in performance on word similarity benchmarks. On the other hand, we observed an opposite trend on categorization benchmarks where Arabic had the most improvement on English. In the discussion section, several reasons for those findings are laid out. We hope that our experiments provide a baseline for further research on inter lingual visual grounding.
Search
Co-authors
- Vlad Niculae 1
- Hassan Shahmohammadi 1
- Hendrik P. A. Lensch 1
- Harald Baayen 1
- Sweta Agrawal 1
- show all...