Caroline Lavecchia


2009

pdf bib
Word- and Sentence-Level Confidence Measures for Machine Translation
Sylvain Raybaud | Caroline Lavecchia | David Langlois | Kamel Smaïli
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

2008

pdf bib
Une alternative aux modèles de traduction statistique d’IBM: Les triggers inter-langues
Caroline Lavecchia | Kamel Smaïli | David Langlois
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous présentons une nouvelle approche pour la traduction automatique fondée sur les triggers inter-langues. Dans un premier temps, nous expliquons le concept de triggers inter-langues ainsi que la façon dont ils sont déterminés. Nous présentons ensuite les différentes expérimentations qui ont été menées à partir de ces triggers afin de les intégrer au mieux dans un processus complet de traduction automatique. Pour cela, nous construisons à partir des triggers inter-langues des tables de traduction suivant différentes méthodes. Nous comparons par la suite notre système de traduction fondé sur les triggers interlangues à un système état de l’art reposant sur le modèle 3 d’IBM (Brown & al., 1993). Les tests menés ont montré que les traductions automatiques générées par notre système améliorent le score BLEU (Papineni & al., 2001) de 2, 4% comparé à celles produites par le système état de l’art.

pdf bib
Phrase-Based Machine Translation based on Simulated Annealing
Caroline Lavecchia | David Langlois | Kamel Smaïli
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we propose a new phrase-based translation model based on inter-lingual triggers. The originality of our method is double. First we identify common source phrases. Then we use inter-lingual triggers in order to retrieve their translations. Furthermore, we consider the way of extracting phrase translations as an optimization issue. For that we use simulated annealing algorithm to find out the best phrase translations among all those determined by inter-lingual triggers. The best phrases are those which improve the translation quality in terms of Bleu score. Tests are achieved on movie subtitle corpora. They show that our phrase-based machine translation (PBMT) system outperforms a state-of-the-art PBMT system by almost 7 points.

2007

pdf bib
Building a bilingual dictionary from movie subtitles based on inter-lingual triggers
Caroline Lavecchia | Kamel Smaili | David Langlois
Proceedings of Translating and the Computer 29

2006

pdf bib
Linguistic features modeling based on Partial New Cache
Kamel Smaïli | Caroline Lavecchia | Jean-Paul Haton
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The agreement in gender and number is a critical problem in statistical language modeling. One of the main problems in the speech recognition of French language is the presence of misrecognized words due to the bad agreement (in gender and number) between words. Statistical language models do not treat this phenomenon directly. This paper focuses on how to handle the issue of agreements. We introduce an original model called Features-Cache (FC) to estimate the gender and the number of the word to predict. It is a dynamic variable-length Features-Cache for which the size is determined in accordance to syntagm delimitors. This model does not need any syntactic parsing, it is used as any other statistical language model. Several models have been carried out and the best one achieves an improvement of more than 8 points in terms of perplexity.