Samy Hedaya


2019

pdf bib
Human-Informed Speakers and Interpreters Analysis in the WAW Corpus and an Automatic Method for Calculating Interpreters’ Décalage
Irina Temnikova | Ahmed Abdelali | Souhila Djabri | Samy Hedaya
Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)

This article presents a multi-faceted analysis of a subset of interpreted conference speeches from the WAW corpus for the English-Arabic language pair. We analyze several speakers and interpreters variables via manual annotation and automatic methods. We propose a new automatic method for calculating interpreters’ décalage based on Automatic Speech Recognition (ASR) and automatic alignment of named entities and content words between speaker and interpreter. The method is evaluated by two human annotators who have expertise in interpreting and Interpreting Studies and shows highly satisfactory results, accompanied with a high inter-annotator agreement. We provide insights about the relations of speakers’ variables, interpreters’ variables and décalage and discuss them from Interpreting Studies and interpreting practice point of view. We had interesting findings about interpreters behavior which need to be extended to a large number of conference sessions in our future research.

2018

pdf bib
The WAW Corpus: The First Corpus of Interpreted Speeches and their Translations for English and Arabic
Ahmed Abdelali | Irina Temnikova | Samy Hedaya | Stephan Vogel
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Interpreting Strategies Annotation in the WAW Corpus
Irina Temnikova | Ahmed Abdelali | Samy Hedaya | Stephan Vogel | Aishah Al Daher
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology

With the aim to teach our automatic speech-to-text translation system human interpreting strategies, our first step is to identify which interpreting strategies are most often used in the language pair of our interest (English-Arabic). In this article we run an automatic analysis of a corpus of parallel speeches and their human interpretations, and provide the results of manually annotating the human interpreting strategies in a sample of the corpus. We give a glimpse of the corpus, whose value surpasses the fact that it contains a high number of scientific speeches with their interpretations from English into Arabic, as it also provides rich information about the interpreters. We also discuss the difficulties, which we encountered on our way, as well as our solutions to them: our methodology for manual re-segmentation and alignment of parallel segments, the choice of annotation tool, and the annotation procedure. Our annotation findings explain the previously extracted specific statistical features of the interpreted corpus (compared with a translation one) as well as the quality of interpretation provided by different interpreters.