Amirhossein Tebbifakhr

2026

CONGRAD: Conflicting Gradient Filtering for Multilingual Preference Alignment
Jiangnan Li | Thuy-Trang Vu | Christian Herold | Amirhossein Tebbifakhr | Shahram Khadivi | Gholamreza Haffari
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Naive joint training of large language models (LLMs) for multilingual preference alignment can suffer from negative interference. This is a known issue in multilingual training, where conflicting objectives degrade overall performance. However, the impact of this phenomenon in the context of multilingual preference alignment remains largely underexplored. To address this issue, we propose ConGrad, an effective and scalable filtering method that mitigates this interference by identifying and selecting preference samples that exhibit high cross-lingual affinity. Based on principles of multi-objective optimization, our approach computes an aggregated, cross-lingually beneficial gradient direction and uses this to filter for samples whose individual gradients align with this consensus direction. To ensure scalability for LLMs, we incorporate a sublinear gradient compression strategy that reduces memory overhead during gradient accumulation. We integrate ConGrad into a self-rewarding framework and evaluate on LLaMA3-8B and Gemma2-2B across 10 languages. Results show that ConGrad consistently outperforms strong baselines in both seen and unseen languages, with minimal alignment tax.

2020

pdf bib abs

Machine-oriented NMT Adaptation for Zero-shot NLP tasks: Comparing the Usefulness of Close and Distant Languages
Amirhossein Tebbifakhr | Matteo Negri | Marco Turchi
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects

Neural Machine Translation (NMT) models are typically trained by considering humans as end-users and maximizing human-oriented objectives. However, in some scenarios, their output is consumed by automatic NLP components rather than by humans. In these scenarios, translations’ quality is measured in terms of their “fitness for purpose” (i.e. maximizing performance of external NLP tools) rather than in terms of standard human fluency/adequacy criteria. Recently, reinforcement learning techniques exploiting the feedback from downstream NLP tools have been proposed for “machine-oriented” NMT adaptation. In this work, we tackle the problem in a multilingual setting where a single NMT model translates from multiple languages for downstream automatic processing in the target language. Knowledge sharing across close and distant languages allows to apply our machine-oriented approach in the zero-shot setting where no labeled data for the test language is seen at training time. Moreover, we incorporate multi-lingual BERT in the source side of our NMT system to benefit from the knowledge embedded in this model. Our experiments show coherent performance gains, for different language directions over both i) “generic” NMT models (trained for human consumption), and ii) fine-tuned multilingual BERT. This gain for zero-shot language directions (e.g. Spanish–English) is higher when the models are fine-tuned on a closely-related source language (Italian) than a distant one (German).

pdf bib abs

Automatic Translation for Multiple NLP tasks: a Multi-task Approach to Machine-oriented NMT Adaptation
Amirhossein Tebbifakhr | Matteo Negri | Marco Turchi
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

Although machine translation (MT) traditionally pursues “human-oriented” objectives, humans are not the only possible consumers of MT output. For instance, when automatic translations are used to feed downstream Natural Language Processing (NLP) components in cross-lingual settings, they should ideally pursue “machine-oriented” objectives that maximize the performance of these components. Tebbifakhr et al. (2019) recently proposed a reinforcement learning approach to adapt a generic neural MT(NMT) system by exploiting the reward from a downstream sentiment classifier. But what if the downstream NLP tasks to serve are more than one? How to avoid the costs of adapting and maintaining one dedicated NMT system for each task? We address this problem by proposing a multi-task approach to machine-oriented NMT adaptation, which is capable to serve multiple downstream tasks with a single system. Through experiments with Spanish and Italian data covering three different tasks, we show that our approach can outperform a generic NMT system, and compete with single-task models in most of the settings.

2019

pdf bib abs

Effort-Aware Neural Automatic Post-Editing
Amirhossein Tebbifakhr | Matteo Negri | Marco Turchi
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

For this round of the WMT 2019 APE shared task, our submission focuses on addressing the “over-correction” problem in APE. Over-correction occurs when the APE system tends to rephrase an already correct MT output, and the resulting sentence is penalized by a reference-based evaluation against human post-edits. Our intuition is that this problem can be prevented by informing the system about the predicted quality of the MT output or, in other terms, the expected amount of needed corrections. For this purpose, following the common approach in multilingual NMT, we prepend a special token to the beginning of both the source text and the MT output indicating the required amount of post-editing. Following the best submissions to the WMT 2018 APE shared task, our backbone architecture is based on multi-source Transformer to encode both the MT output and the corresponding source text. We participated both in the English-German and English-Russian subtasks. In the first subtask, our best submission improved the original MT output quality up to +0.98 BLEU and -0.47 TER. In the second subtask, where the higher quality of the MT output increases the risk of over-correction, none of our submitted runs was able to improve the MT output.

pdf bib abs

Machine Translation for Machines: the Sentiment Classification Use Case
Amirhossein Tebbifakhr | Luisa Bentivogli | Matteo Negri | Marco Turchi
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose a neural machine translation (NMT) approach that, instead of pursuing adequacy and fluency (“human-oriented” quality criteria), aims to generate translations that are best suited as input to a natural language processing component designed for a specific downstream task (a “machine-oriented” criterion). Towards this objective, we present a reinforcement learning technique based on a new candidate sampling strategy, which exploits the results obtained on the downstream task as weak feedback. Experiments in sentiment classification of Twitter data in German and Italian show that feeding an English classifier with “machine-oriented” translations significantly improves its performance. Classification results outperform those obtained with translations produced by general-purpose NMT models as well as by an approach based on reinforcement learning. Moreover, our results on both languages approximate the classification accuracy computed on gold standard English tweets.

pdf bib abs

Data Augmentation for End-to-End Speech Translation: FBK@IWSLT ‘19
Mattia A. Di Gangi | Matteo Negri | Viet Nhat Nguyen | Amirhossein Tebbifakhr | Marco Turchi
Proceedings of the 16th International Conference on Spoken Language Translation

This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. The task consists in the “direct” translation (i.e. without intermediate discrete representation) of English speech data derived from TED Talks or lectures into German texts. Our participation had a twofold goal: i) testing our latest models, and ii) eval- uating the contribution to model training of different data augmentation techniques. On the model side, we deployed our recently proposed S-Transformer with logarithmic distance penalty, an ST-oriented adaptation of the Transformer architecture widely used in machine translation (MT). On the training side, we focused on data augmentation techniques recently proposed for ST and automatic speech recognition (ASR). In particular, we exploited augmented data in different ways and at different stages of the process. We first trained an end-to-end ASR system and used the weights of its encoder to initialize the decoder of our ST model (transfer learning). Then, we used an English-German MT system trained on large data to translate the English side of the English-French training set into German, and used this newly-created data as additional training material. Finally, we trained our models using SpecAugment, an augmentation technique that randomly masks portions of the spectrograms in order to make them different at every training epoch. Our synthetic corpus and SpecAugment resulted in an improvement of 5 BLEU points over our baseline model on the test set of MuST-C En-De, reaching the score of 22.3 with a single end-to-end system.

pdf bib

MorphoBERT: a Persian NER System with BERT and Morphological Analysis
Mahdi Mohseni | Amirhossein Tebbifakhr
Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers

2018

pdf bib abs

Multi-source transformer with combined losses for automatic post editing
Amirhossein Tebbifakhr | Ruchit Agrawal | Matteo Negri | Marco Turchi
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

Recent approaches to the Automatic Post-editing (APE) of Machine Translation (MT) have shown that best results are obtained by neural multi-source models that correct the raw MT output by also considering information from the corresponding source sentence. To this aim, we present for the first time a neural multi-source APE model based on the Transformer architecture. Moreover, we employ sequence-level loss functions in order to avoid exposure bias during training and to be consistent with the automatic evaluation metrics used for the task. These are the main features of our submissions to the WMT 2018 APE shared task, where we participated both in the PBSMT subtask (i.e. the correction of MT outputs from a phrase-based system) and in the NMT subtask (i.e. the correction of neural outputs). In the first subtask, our system improves over the baseline up to -5.3 TER and +8.23 BLEU points ranking second out of 11 submitted runs. In the second one, characterized by the higher quality of the initial translations, we report lower but statistically significant gains (up to -0.38 TER and +0.8 BLEU), ranking first out of 10 submissions.

pdf bib

Multi-source Transformer for Automatic Post-Editing
Amirhossein Tebbifakhr | Ruchit Agrawal | Matteo Negri | Marco Turchi
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)