George Tambouratzis


pdf bib
Evaluating Corpus Cleanup Methods in the WMT’22 News Translation Task
Marilena Malli | George Tambouratzis
Proceedings of the Seventh Conference on Machine Translation (WMT)

This submission to the WMT22: General MT Task, consists of translations produced from a series of NMT models of the following two language pairs: german-to-english and german-to-french. All the models are trained using only the parallel training data specified by WMT22, and no monolingual training data was used. The models follow the transformer architecture employing 8 attention heads and 6 layers in both the encoder and decoder. It is also worth mentioning that, in order to limit the computational resources that we would use during the training process, we decided to train the majority of models by limiting the training to 21 epochs. Moreover, the translations submitted at WMT22 have been produced using the test data released by the WMT22.The aim of our experiments has been to evaluate methods for cleaning-up a parallel corpus to determine if this will lead to a translation model producing more accurate translations. For each language pair, the base NMT models has been trained from raw parallel training corpora, while the additional NMT models have been trained with corpora subjected to a special cleaning process with the following tools: Bifixer and Bicleaner. It should be mentioned that the Bicleaner repository doesn’t provide pre-trained classifiers for the above language pairs, consequently we trained probabilistic dictionaries in order to produce new models. The fundamental differences between these NMT models produced are mainly related to the quality and the quantity of the training data, while there are very few differences in the training parameters. To complete this work, we used the following three software packages: (i) MARIAN NMT (Version: v1.11.5), which was used for the training of the neural machine translation models and (ii) Bifixer and (iii) Bicleaner, which were used in order to correct and clean the parallel training data. Concerning the Bifixer and Bicleaner tools, we followed all the steps as described meticulously in the following article: “Ramírez-Sánchez, G., Zaragoza-Bernabeu, J., Bañón, M., & Rojas, S.O. (2020). Bifixer and Bicleaner: two open-source tools to clean your parallel data. EAMT. ” and also in the official github pages:,


pdf bib
Alignment verification to improve NMT translation towards highly inflectional languages with limited resources
George Tambouratzis
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The present article discusses how to improve translation quality when using limited training data to translate towards morphologically rich languages. The starting point is a neural MT system, used to train translation models, using solely publicly available parallel data. An initial analysis of the translation output has shown that quality is sub-optimal, due mainly to an insufficient amount of training data. To improve translation quality, a hybridized solution is proposed, using an ensemble of relatively simple NMT systems trained with different metrics, combined with an open source module, designed for a low-resource MT system. Experimental results of the proposed hybridized method with multiple independent test sets achieve improvements over (i) both the best individual NMT and (ii) the standard ensemble system provided in the Marian-NMT system. Improvements over Marian-NMT are in many cases statistically significant. Finally, a qualitative analysis of translation results indicates a greater robustness for the hybridized method.


pdf bib
An Overview of the SEBAMAT Project
Reinhard Rapp | George Tambouratzis
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

SEBAMAT (semantics-based MT) is a Marie Curie project intended to con-tribute to the state of the art in machine translation (MT). Current MT systems typically take the semantics of a text only in so far into account as they are implicit in the underlying text corpora or dictionaries. Occasionally it has been argued that it may be difficult to advance MT quality to the next level as long as the systems do not make more explicit use of semantic knowledge. SEBAMAT aims to evaluate three approaches incorporating such knowledge into MT.


pdf bib
Linguistically Inspired Language Model Augmentation for MT
George Tambouratzis | Vasiliki Pouli
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The present article reports on efforts to improve the translation accuracy of a corpus―based Machine Translation (MT) system. In order to achieve that, an error analysis performed on past translation outputs has indicated the likelihood of improving the translation accuracy by augmenting the coverage of the Target-Language (TL) side language model. The method adopted for improving the language model is initially presented, based on the concatenation of consecutive phrases. The algorithmic steps are then described that form the process for augmenting the language model. The key idea is to only augment the language model to cover the most frequent cases of phrase sequences, as counted over a TL-side corpus, in order to maximize the cases covered by the new language model entries. Experiments presented in the article show that substantial improvements in translation accuracy are achieved via the proposed method, when integrating the grown language model to the corpus-based MT system.


pdf bib
Establishing sentential structure via realignments from small parallel corpora
George Tambouratzis | Vassiliki Pouli
Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)


pdf bib
Comparing CRF and template-matching in phrasing tasks within a Hybrid MT system
George Tambouratzis
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Expanding the Language model in a low-resource hybrid MT system
George Tambouratzis | Sokratis Sofianopoulos | Marina Vassiliou
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation


pdf bib
A Review of the PRESEMT project
George Tambouratzis | Marina Vassiliou | Sokratis Sofianopoulos........
Proceedings of Machine Translation Summit XIV: European projects

pdf bib
Language-independent hybrid MT with PRESEMT
George Tambouratzis | Sokratis Sofianopoulos | Marina Vassiliou
Proceedings of the Second Workshop on Hybrid Approaches to Translation


pdf bib
Evaluating the Translation Accuracy of a Novel Language-Independent MT Methodology
George Tambouratzis | Sokratis Sofianopoulos | Marina Vassiliou
Proceedings of COLING 2012

pdf bib
PRESEMT: Pattern Recognition-based Statistically Enhanced MT
George Tambouratzis | Marina Vassiliou | Sokratis Sofianopoulos
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Implementing a Language-Independent MT Methodology
Sokratis Sofianopoulos | Marina Vassiliou | George Tambouratzis
Proceedings of the First Workshop on Multilingual Modeling


pdf bib
A resource-light phrase scheme for language-portable MT
George Tambouratzis | Fotini Simistira | Sokratis Sofianopoulos | Nikos Tsimboukakis | Marina Vassiliou
Proceedings of the 15th Annual Conference of the European Association for Machine Translation


pdf bib
Using Patterns for Machine Translation
Stella Makantonatou | Sokratis Sofianopoulos | Vassiliki Spilioti | George Tambouratzis | Marina Vassiliou | Olga Yannoutsou
Proceedings of the 11th Annual Conference of the European Association for Machine Translation


pdf bib
Using monolingual corpora for statistical machine translation: the METIS system
Yannis Dologlou | Stella Markantonatou | George Tambouratzis | Olga Yannoutsou | Athanassia Fourla | Nikos Iannou
EAMT Workshop: Improving MT through other language technology tools: resources and tools for building MT


pdf bib
Automatic Style Categorisation of Corpora in the Greek Language
George Tambouratzis | Stella Markantonatou | Nikolaos Hairetakis | George Carayannis
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Discriminating the registers and styles in the Modern Greek language
George Tambouratzis | Stella Markantonatou | Nikolaos Hairetakis | Marina Vassiliou | Dimitrios Tambouratzis | George Carayannis
The Workshop on Comparing Corpora