Shahram Khadivi - ACL Anthology

Shahram Khadivi

2026

CONGRAD: Conflicting Gradient Filtering for Multilingual Preference Alignment
Jiangnan Li | Thuy-Trang Vu | Christian Herold | Amirhossein Tebbifakhr | Shahram Khadivi | Gholamreza Haffari
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Naive joint training of large language models (LLMs) for multilingual preference alignment can suffer from negative interference. This is a known issue in multilingual training, where conflicting objectives degrade overall performance. However, the impact of this phenomenon in the context of multilingual preference alignment remains largely underexplored. To address this issue, we propose ConGrad, an effective and scalable filtering method that mitigates this interference by identifying and selecting preference samples that exhibit high cross-lingual affinity. Based on principles of multi-objective optimization, our approach computes an aggregated, cross-lingually beneficial gradient direction and uses this to filter for samples whose individual gradients align with this consensus direction. To ensure scalability for LLMs, we incorporate a sublinear gradient compression strategy that reduces memory overhead during gradient accumulation. We integrate ConGrad into a self-rewarding framework and evaluate on LLaMA3-8B and Gemma2-2B across 10 languages. Results show that ConGrad consistently outperforms strong baselines in both seen and unseen languages, with minimal alignment tax.

Adapting Vision-Language Models for E-commerce Understanding at Scale
Matteo Nulli | Orshulevich Vladimir | Tala Bazazo | Christian Herold | Michael Kozielski | Marcin Mazur | Szymon Tuzel | Cees G. M. Snoek | Seyyed Hadi Hashemi | Omar Javed | Yannick Versley | Shahram Khadivi
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)

E-commerce product understanding demands by nature, strong multimodal comprehension from text, images, and structured attributes. General-purpose Vision–Language Models (VLMs) enable generalizable multimodal latent modelling, yet there is no documented, well-known strategy for adapting them to the attribute-centric, multi-image, and noisy nature of e-commerce data, without sacrificing general performance. In this work, we show through a large-scale experimental study, how targeted adaptation of general VLMs can substantially improve e-commerce performance while preserving broad multimodal capabilities. Furthermore, we propose a novel extensive evaluation suite covering deep product understanding, strict instruction following, and dynamic attribute extraction.

2025

ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning
Baohao Liao | Christian Herold | Seyyed Hadi Hashemi | Stefan Vasilev | Shahram Khadivi | Christof Monz
Findings of the Association for Computational Linguistics: ACL 2025

As large language models (LLMs) scale, model compression is crucial for edge deployment and accessibility. Weight-only quantization reduces model size but suffers from performance degradation at lower bit widths. Moreover, standard finetuning is incompatible with quantized models, and alternative methods often fall short of full finetuning. In this paper, we propose ClusComp, a simple yet effective compression paradigm that clusters weight matrices into codebooks and finetunes them block-by-block. ClusComp (1) achieves superior performance in 2-4 bit quantization, (2) pushes compression to 1-bit while outperforming ultra-low-bit methods with minimal finetuning, and (3) enables efficient finetuning, even surpassing existing quantization-based approaches and rivaling full FP16 finetuning. Notably, ClusComp supports compression and finetuning of 70B LLMs on a single A6000-48GB GPU.

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Stefan Vasilev | Christian Herold | Baohao Liao | Seyyed Hadi Hashemi | Shahram Khadivi | Christof Monz
Findings of the Association for Computational Linguistics: ACL 2025

This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model’s outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model’s ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit’s superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit’s robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning.

Domain Adaptation of Foundation LLMs for e-Commerce
Christian Herold | Michael Kozielski | Tala Bazazo | Pavel Petrushkov | Yannick Versley | Seyyed Hadi Hashemi | Patrycja Cieplicka | Dominika Basaj | Shahram Khadivi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

We present the e-Llama models: 8 billion and 70 billion parameter large language models that are adapted towards the e-commerce domain.These models are meant as foundation models with deep knowledge about e-commerce, that form a base for instruction- and fine-tuning.The e-Llama models are obtained by continuously pretraining the Llama 3.1 base models on 1 trillion tokens of domain-specific data.We discuss our approach and motivate our choice of hyperparameters with a series of ablation studies.To quantify how well the models have been adapted to the e-commerce domain, we define and implement a set of multilingual, e-commerce specific evaluation tasks.We show that, when carefully choosing the training setup, the Llama 3.1 models can be adapted towards the new domain without sacrificing significant performance on general domain tasks.We also explore the possibility of merging the adapted model and the base model for a better control of the performance trade-off between domains.

2024

IKUN for WMT24 General MT Task: LLMs Are Here for Multilingual Machine Translation
Baohao Liao | Christian Herold | Shahram Khadivi | Christof Monz
Proceedings of the Ninth Conference on Machine Translation

This paper introduces two multilingual systems, IKUN and IKUN-C, developed for the general machine translation task in WMT24. IKUN and IKUN-C represent an open system and a constrained system, respectively, built on Llama-3-8b and Mistral-7B-v0.3. Both systems are designed to handle all 11 language directions using a single model. According to automatic evaluation metrics, IKUN-C achieved 6 first-place and 3 second-place finishes among all constrained systems, while IKUN secured 1 first-place and 2 second-place finishes across both open and constrained systems. These encouraging results suggest that large language models (LLMs) are nearing the level of proficiency required for effective multilingual machine translation. The systems are based on a two-stage approach: first, continuous pre-training on monolingual data in 10 languages, followed by fine-tuning on high-quality parallel data for 11 language directions. The primary difference between IKUN and IKUN-C lies in their monolingual pre-training strategy. IKUN-C is pre-trained using constrained monolingual data, whereas IKUN leverages monolingual data from the OSCAR dataset. In the second phase, both systems are fine-tuned on parallel data sourced from NTREX, Flores, and WMT16-23 for all 11 language pairs.

ApiQ: Finetuning of 2-Bit Quantized Large Language Model
Baohao Liao | Christian Herold | Shahram Khadivi | Christof Monz
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the effectiveness of these methods compared to full finetuning. Despite the advancements, current strategies for memory-efficient finetuning, such as QLoRA, exhibit inconsistent performance across diverse bit-width quantizations and multifaceted tasks. This inconsistency largely stems from the detrimental impact of the quantization process on preserved knowledge, leading to catastrophic forgetting and undermining the utilization of pretrained models for finetuning purposes. In this work, we introduce a novel quantization framework named ApiQ, designed to restore the lost information from quantization by concurrently initializing the LoRA components and quantizing the weights of LLMs. This approach ensures the maintenance of the original LLM’s activation precision while mitigating the error propagation from shallower into deeper layers. Through comprehensive evaluations conducted on a spectrum of language tasks with various LLMs, ApiQ demonstrably minimizes activation error during quantization. Consequently, it consistently achieves superior finetuning results across various bit-widths. Notably, one can even finetune a 2-bit Llama-2-70b with ApiQ on a single NVIDIA A100-80GB GPU without any memory-saving techniques, and achieve promising results.

2023

Document-Level Language Models for Machine Translation
Frithjof Petrick | Christian Herold | Pavel Petrushkov | Shahram Khadivi | Hermann Ney
Proceedings of the Eighth Conference on Machine Translation

Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information available. In this work, we set out to build context-aware translation systems utilizing document-level monolingual data instead. This can be achieved by combining any existing sentence-level translation model with a document-level language model. We improve existing approaches by leveraging recent advancements in model combination. Additionally, we propose novel weighting techniques that make the system combination more flexible and significantly reduce computational overhead. In a comprehensive evaluation on four diverse translation tasks, we show that our extensions improve document-targeted scores significantly and are also computationally more efficient. However, we also find that in most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system. Finally, we explore language model fusion in the light of recent advancements in large language models. Our findings suggest that there might be strong potential in utilizing large language models via model combination.

Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking
Inigo Urteaga | Moulay Zaidane Draidia | Tomer Lancewicki | Shahram Khadivi
Findings of the Association for Computational Linguistics: ACL 2023

We design and evaluate a Bayesian optimization framework for resource efficient pre-training of Transformer-based language models (TLMs). TLM pre-training requires high computational resources and introduces many unresolved design choices, such as selecting its pre-training hyperparameters.We propose a multi-armed bandit framework for the sequential selection of pre-training hyperparameters, aimed at optimizing language model performance, in a resource efficient manner. We design a Thompson sampling algorithm, with a surrogate Gaussian process reward model of the Masked Language Model (MLM) pre-training objective, for its sequential minimization. Instead of MLM pre-training with fixed masking probabilities, the proposed Gaussian process-based Thompson sampling (GP-TS) accelerates pre-training by sequentially selecting masking hyperparameters that improve performance. We empirically demonstrate how GP-TS pre-trains language models efficiently, i.e., it achieves lower MLM loss in fewer epochs, across a variety of settings. In addition, GP-TS pre-trained TLMs attain competitive downstream performance, while avoiding expensive hyperparameter grid search. GP-TS provides an interactive framework for efficient and optimized TLM pre-training that, by circumventing costly hyperparameter selection, enables substantial computational savings.

Probabilistic Robustness for Data Filtering
Yu Yu | Abdul Rafae Khan | Shahram Khadivi | Jia Xu
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

We introduce our probabilistic robustness rewarded data optimization (PRoDO) approach as a framework to enhance the model’s generalization power by selecting training data that optimizes our probabilistic robustness metrics. We use proximal policy optimization (PPO) reinforcement learning to approximately solve the computationally intractable training subset selection problem. The PPO’s reward is defined as our (𝛼,𝜖, 𝛾)-Robustness that measures performance consistency over multiple domains by simulating unknown test sets in real-world scenarios using a leaving-one-out strategy. We demonstrate that our PRoDO effectively filters data that lead to significantly higher prediction accuracy and robustness on unknown-domain test sets. Our experiments achieve up to +17.2% increase of accuracy (+25.5% relatively) in sentiment analysis, and -28.05 decrease of perplexity (-32.1% relatively) in language modeling.In addition, our probabilistic (𝛼,𝜖, 𝛾)-Robustness definition serves as an evaluation metric with higher levels of agreement with human annotations than typical performance-based metrics.

2022

Can Domains Be Transferred across Languages in Multi-Domain Multilingual Neural Machine Translation?
Thuy-Trang Vu | Shahram Khadivi | Xuanli He | Dinh Phung | Gholamreza Haffari
Proceedings of the Seventh Conference on Machine Translation (WMT)

Previous works mostly focus on either multilingual or multi-domain aspects of neural machine translation (NMT). This paper investigates whether the domain information can be transferred across languages on the composition of multi-domain and multilingual NMT, particularly for the incomplete data condition where in-domain bitext is missing for some language pairs. Our results in the curated leave-one-domain-out experiments show that multi-domain multilingual (MDML) NMT can boost zero-shot translation performance up to +10 gains on BLEU, as well as aid the generalisation of multi-domain NMT to the missing domain. We also explore strategies for effective integration of multilingual and multi-domain NMT, including language and domain tag combination and auxiliary task training. We find that learning domain-aware representations and adding target-language tags to the encoder leads to effective MDML-NMT.

A Preordered RNN Layer Boosts Neural Machine Translation in Low Resource Settings
Mohaddeseh Bastan | Shahram Khadivi
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)

Neural Machine Translation (NMT) models are strong enough to convey semantic and syntactic information from the source language to the target language. However, these models are suffering from the need for a large amount of data to learn the parameters. As a result, for languages with scarce data, these models are at risk of underperforming. We propose to augment attention based neural network with reordering information to alleviate the lack of data. This augmentation improves the translation quality for both English to Persian and Persian to English by up to 6% BLEU absolute over the baseline models.

Domain Generalisation of NMT: Fusing Adapters with Leave-One-Domain-Out Training
Thuy-Trang Vu | Shahram Khadivi | Dinh Phung | Gholamreza Haffari
Findings of the Association for Computational Linguistics: ACL 2022

Generalising to unseen domains is under-explored and remains a challenge in neural machine translation. Inspired by recent research in parameter-efficient transfer learning from pretrained models, this paper proposes a fusion-based generalisation method that learns to combine domain-specific parameters. We propose a leave-one-domain-out training strategy to avoid information leaking to address the challenge of not knowing the test domain during training time. Empirical results on three language pairs show that our proposed fusion method outperforms other baselines up to +0.8 BLEU score on average.

Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing
Colin Cherry | Angela Fan | George Foster | Gholamreza (Reza) Haffari | Shahram Khadivi | Nanyun (Violet) Peng | Xiang Ren | Ehsan Shareghi | Swabha Swayamdipta
Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing

Can Data Diversity Enhance Learning Generalization?
Yu Yu | Shahram Khadivi | Jia Xu
Proceedings of the 29th International Conference on Computational Linguistics

This paper introduces our Diversity Advanced Actor-Critic reinforcement learning (A2C) framework (DAAC) to improve the generalization and accuracy of Natural Language Processing (NLP). We show that the diversification of training samples alleviates overfitting and improves model generalization and accuracy. We quantify diversity on a set of samples using the max dispersion, convex hull volume, and graph entropy based on sentence embeddings in high-dimensional metric space. We also introduce A2C to select such a diversified training subset efficiently. Our experiments achieve up to +23.8 accuracy increase (38.0% relatively) in sentiment analysis, -44.7 perplexity decrease (37.9% relatively) in language modeling, and consistent improvements in named entity recognition over various domains. In particular, our method outperforms both domain adaptation and generalization baselines without using any target domain knowledge.

2021

Back-translation for Large-Scale Multilingual Machine Translation
Baohao Liao | Shahram Khadivi | Sanjika Hewavitharana
Proceedings of the Sixth Conference on Machine Translation

This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). In this work, we aim to build a single multilingual translation system with a hypothesis that a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation methods from bilingual translation to multilingual translation. Better performance is obtained by the constrained sampling method, which is different from the finding of the bilingual translation. Besides, we also explore the effect of vocabularies and the amount of synthetic data. Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement. We submitted to both the small tasks and achieve the second place.

Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer
Evgeniia Tokarchuk | Jan Rosendahl | Weiyue Wang | Pavel Petrushkov | Tomer Lancewicki | Shahram Khadivi | Hermann Ney
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

Complex natural language applications such as speech translation or pivot translation traditionally rely on cascaded models. However,cascaded models are known to be prone to error propagation and model discrepancy problems. Furthermore, there is no possibility of using end-to-end training data in conventional cascaded systems, meaning that the training data most suited for the task cannot be used. Previous studies suggested several approaches for integrated end-to-end training to overcome those problems, however they mostly rely on(synthetic or natural) three-way data. We propose a cascaded model based on the non-autoregressive Transformer that enables end-to-end training without the need for an explicit intermediate representation. This new architecture (i) avoids unnecessary early decisions that can cause errors which are then propagated throughout the cascaded models and (ii) utilizes the end-to-end training data directly. We conduct an evaluation on two pivot-based machine translation tasks, namely French→German and German→Czech. Our experimental results show that the proposed architecture yields an improvement of more than 2 BLEU for French→German over the cascaded baseline.

2020

Diving Deep into Context-Aware Neural Machine Translation
Jingjing Huo | Christian Herold | Yingbo Gao | Leonard Dahlmann | Shahram Khadivi | Hermann Ney
Proceedings of the Fifth Conference on Machine Translation

Context-aware neural machine translation (NMT) is a promising direction to improve the translation quality by making use of the additional context, e.g., document-level translation, or having meta-information. Although there exist various architectures and analyses, the effectiveness of different context-aware NMT models is not well explored yet. This paper analyzes the performance of document-level NMT models on four diverse domains with a varied amount of parallel document-level bilingual data. We conduct a comprehensive set of experiments to investigate the impact of document-level NMT. We find that there is no single best approach to document-level NMT, but rather that different architectures come out on top on different tasks. Looking at task-specific problems, such as pronoun resolution or headline translation, we find improvements in the context-aware systems, even in cases where the corpus-level metrics like BLEU show no significant improvement. We also show that document-level back-translation significantly helps to compensate for the lack of document-level bi-texts.

2019

Generalizing Back-Translation in Neural Machine Translation
Miguel Graça | Yunsu Kim | Julian Schamper | Shahram Khadivi | Hermann Ney
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

Back-translation — data augmentation by translating target monolingual data — is a crucial component in modern neural machine translation (NMT). In this work, we reformulate back-translation in the scope of cross-entropy optimization of an NMT model, clarifying its underlying mathematical assumptions and approximations beyond its heuristic usage. Our formulation covers broader synthetic data generation schemes, including sampling from a target-to-source NMT model. With this formulation, we point out fundamental problems of the sampling-based approaches and propose to remedy them by (i) disabling label smoothing for the target-to-source model and (ii) sampling from a restricted search space. Our statements are investigated on the WMT 2018 German <-> English news translation task.

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron
Yunsu Kim | Hendrik Rosendahl | Nick Rossenbach | Jan Rosendahl | Shahram Khadivi | Hermann Ney
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our approach shows promising performance on sentence alignment recovery and the WMT 2018 parallel corpus filtering tasks with only a single model.

Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Colin Cherry | Greg Durrett | George Foster | Reza Haffari | Shahram Khadivi | Nanyun Peng | Xiang Ren | Swabha Swayamdipta
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages
Yunsu Kim | Petre Petrov | Pavel Petrushkov | Shahram Khadivi | Hermann Ney
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We present effective pre-training strategies for neural machine translation (NMT) using parallel corpora involving a pivot language, i.e., source-pivot and pivot-target, leading to a significant improvement in source-target translation. We propose three methods to increase the relation among source, pivot, and target languages in the pre-training: 1) step-wise training of a single model for different language pairs, 2) additional adapter component to smoothly connect pre-trained encoder and decoder, and 3) cross-lingual encoder training via autoencoding of the pivot language. Our methods greatly outperform multilingual models up to +2.6% BLEU in WMT 2019 French-German and German-Czech tasks. We show that our improvements are valid also in zero-shot/zero-resource scenarios.

2018

Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Reza Haffari | Colin Cherry | George Foster | Shahram Khadivi | Bahar Salehi
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

Learning from Chunk-based Feedback in Neural Machine Translation
Pavel Petrushkov | Shahram Khadivi | Evgeny Matusov
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We empirically investigate learning from partial feedback in neural machine translation (NMT), when partial feedback is collected by asking users to highlight a correct chunk of a translation. We propose a simple and effective way of utilizing such feedback in NMT training. We demonstrate how the common machine translation problem of domain mismatch between training and deployment can be reduced solely based on chunk-level user feedback. We conduct a series of simulation experiments to test the effectiveness of the proposed method. Our results show that chunk-level feedback outperforms sentence based feedback by up to 2.61% BLEU absolute.

Can Neural Machine Translation be Improved with User Feedback?
Julia Kreutzer | Shahram Khadivi | Evgeny Matusov | Stefan Riezler
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

We present the first real-world application of methods for improving neural machine translation (NMT) with human reinforcement, based on explicit and implicit user feedback collected on the eBay e-commerce platform. Previous work has been confined to simulation experiments, whereas in this paper we work with real logged feedback for offline bandit learning of NMT parameters. We conduct a thorough analysis of the available explicit user judgments—five-star ratings of translation quality—and show that they are not reliable enough to yield significant improvements in bandit learning. In contrast, we successfully utilize implicit task-based feedback collected in a cross-lingual search task to improve task-specific and machine translation quality metrics.

Word-based Domain Adaptation for Neural Machine Translation
Shen Yan | Leonard Dahlmann | Pavel Petrushkov | Sanjika Hewavitharana | Shahram Khadivi
Proceedings of the 15th International Conference on Spoken Language Translation

In this paper, we empirically investigate applying word-level weights to adapt neural machine translation to e-commerce domains, where small e-commerce datasets and large out-of-domain datasets are available. In order to mine in-domain like words in the out-of-domain datasets, we compute word weights by using a domain-specific and a non-domain-specific language model followed by smoothing and binary quantization. The baseline model is trained on mixed in-domain and out-of-domain datasets. Experimental results on En → Zh e-commerce domain translation show that compared to continuing training without word weights, it improves MT quality by up to 3.11% BLEU absolute and 1.59% TER. We have also trained models using fine-tuning on the in-domain data. Pre-training a model with word weights improves fine-tuning up to 1.24% BLEU absolute and 1.64% TER, respectively.

2017

Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search
Leonard Dahlmann | Evgeny Matusov | Pavel Petrushkov | Shahram Khadivi
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce a hybrid search for attention-based neural machine translation (NMT). A target phrase learned with statistical MT models extends a hypothesis in the NMT beam search when the attention of the NMT model focuses on the source words translated by this phrase. Phrases added in this way are scored with the NMT model, but also with SMT features including phrase-level translation probabilities and a target language model. Experimental results on German-to-English news domain and English-to-Russian e-commerce domain translation tasks show that using phrase-based models in NMT search improves MT quality by up to 2.3% BLEU absolute as compared to a strong NMT baseline.

Neural and Statistical Methods for Leveraging Meta-information in Machine Translation
Shahram Khadivi | Patrick Wilken | Leonard Dahlmann | Evgeny Matusov
Proceedings of Machine Translation Summit XVI: Research Track

2016

Guided Alignment Training for Topic-Aware Neural Machine Translation
Wenhu Chen | Evgeny Matusov | Shahram Khadivi | Jan-Thorsten Peter
Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track

In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by many unknown words and a large type/token ratio. We also show that meta-data associated with input texts such as topic or category information can significantly improve translation quality when used as an additional signal to the decoder part of the network. With both novel features, the BLEU score of the NMT system on a product title set improves from 18.6 to 21.3%. Even larger MT quality gains are obtained through domain adaptation of a general domain NMT system to e-commerce data. The developed NMT system also performs well on the IWSLT speech translation task, where an ensemble of four variant systems outperforms the phrase-based baseline by 2.1% BLEU absolute.

2015

A Generative Model for Extracting Parallel Fragments from Comparable Documents
Somayeh Bakhshaei | Shahram Khadivi | Reza Safabakhsh
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora

Improved search strategy for interactive predictions in computer-assisted translation
Fatemeh Azadi | Shahram Khadivi
Proceedings of Machine Translation Summit XV: Papers

2014

Graph-Based Semi-Supervised Conditional Random Fields For Spoken Language Understanding Using Unaligned Data
Mohammad Aliannejadi | Masoud Kiaeeha | Shahram Khadivi | Saeed Shiry Ghidary
Proceedings of the Australasian Language Technology Association Workshop 2014

2013

Using Context Vectors in Improving a Machine Translation System with Bridge Language
Samira Tofighi Zahabi | Somayeh Bakhshaei | Shahram Khadivi
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Meta-level Statistical Machine Translation
Sajad Ebrahimi | Kourosh Meshgi | Shahram Khadivi | Mohammad Ebrahim Shiri Ahmad Abady
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

A Holistic Approach to Bilingual Sentence Fragment Extraction from Comparable Corpora
Mahdi Khademian | Kaveh Taghipour | Saab Mansour | Shahram Khadivi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Achieving accurate translation, especially in multiple domain documents with statistical machine translation systems, requires more and more bilingual texts and this need becomes more critical when training such systems for language pairs with scarce training data. In the recent years, there have been some researches on new sources of parallel texts that are documents which are not necessarily parallel but are comparable. Since these methods search for possible translation equivalences in a greedy manner, they are unable to consider all possible parallel texts in comparable documents. This paper investigates a different approach for this need by considering relationships between all words of two comparable documents, which works fairly well even in the worst case of comparability. We represent each document pair in a matrix and then transform it to a new space to find parallel fragments. Evaluations show that the system is successful in extraction of useful fragment pairs.

A New Search Approach for Interactive-Predictive Computer-Assisted Translation
Zeinab Vakil | Shahram Khadivi
Proceedings of COLING 2012: Posters

Interactive-predictive speech-enabled computer-assisted translation
Shahram Khadivi | Zeinab Vakil
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

In this paper, we study the incorporation of statistical machine translation models to automatic speech recognition models in the framework of computer-assisted translation. The system is given a source language text to be translated and it shows the source text to the human translator to translate it orally. The system captures the user speech which is the dictation of the target language sentence. Then, the human translator uses an interactive-predictive process to correct the system generated errors. We show the efficiency of this method by higher human productivity gain compared to the baseline systems: pure ASR system and integrated ASR and MT systems.

Developing an Open-domain English-Farsi Translation System Using AFEC: Amirkabir Bilingual Farsi-English Corpus
Fattaneh Jabbari | Somayeh Bakshaei | Seyyed Mohammad Mohammadzadeh Ziabary | Shahram Khadivi
Fourth Workshop on Computational Approaches to Arabic-Script-based Languages

The translation quality of Statistical Machine Translation (SMT) depends on the amount of input data especially for morphologically rich languages. Farsi (Persian) language is such a language which has few NLP resources. It also suffers from the non-standard written characters which causes a large variety in the written form of each character. Moreover, the structural difference between Farsi and English results in long range reorderings which cannot be modeled by common SMT reordering models. Here, we try to improve the existing English-Farsi SMT system focusing on these challenges first by expanding our bilingual limited-domain corpus to an open-domain one. Then, to alleviate the character variations, a new text normalization algorithm is offered. Finally, some hand-crafted rules are applied to reduce the structural differences. Using the new corpus, the experimental results showed 8.82% BLEU improvement by applying new normalization method and 9.1% BLEU when rules are used.

2011

The Amirkabir Machine Transliteration System for NEWS 2011: Farsi-to-English Task
Najmeh Mousavi Nejad | Shahram Khadivi | Kaveh Taghipour
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

An Unsupervised Alignment Model for Sequence Labeling: Application to Name Transliteration
Najmeh Mousavi Nejad | Shahram Khadivi
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

Parallel Corpus Refinement as an Outlier Detection Algorithm
Kaveh Taghipour | Shahram Khadivi | Jia Xu
Proceedings of Machine Translation Summit XIII: Papers

2010

WordNet Based Features for Predicting Brain Activity associated with meanings of nouns
Ahmad Babaeian Jelodar | Mehrdad Alizadeh | Shahram Khadivi
Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics

2009

Statistical Approaches to Computer-Assisted Translation
Sergio Barrachina | Oliver Bender | Francisco Casacuberta | Jorge Civera | Elsa Cubel | Shahram Khadivi | Antonio Lagarda | Hermann Ney | Jesús Tomás | Enrique Vidal | Juan-Miguel Vilar
Computational Linguistics, Volume 35, Number 1, March 2009

2007

A Sequence Alignment Model Based on the Averaged Perceptron
Dayne Freitag | Shahram Khadivi
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

Morpho-syntactic Arabic Preprocessing for Arabic to English Statistical Machine Translation
Anas El Isbihani | Shahram Khadivi | Oliver Bender | Hermann Ney
Proceedings on the Workshop on Statistical Machine Translation

Integration of Speech to Computer-Assisted Translation Using Finite-State Automata
Shahram Khadivi | Richard Zens | Hermann Ney
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

A Flexible Architecture for CAT Applications
Saša Hasan | Shahram Khadivi | Richard Zens | Hermann Ney
Proceedings of the 11th Annual Conference of the European Association for Machine Translation

2005

The RWTH Phrase-based Statistical Machine Translation System
Richard Zens | Oliver Bender | Sasa Hasan | Shahram Khadivi | Evgeny Matusov | Jia Xu | Yuqi Zhang | Hermann Ney
Proceedings of the Second International Workshop on Spoken Language Translation

Co-authors

Leonard Dahlmann 4

Seyyed Hadi Hashemi 4

Christof Monz 4

Oliver Bender 3

George Foster 3

Kaveh Taghipour 3

Somayeh Bakhshaei 2

Sanjika Hewavitharana 2

Michael Kozielski 2

Tomer Lancewicki 2

Najmeh Mousavi Nejad 2

Jan Rosendahl 2

Swabha Swayamdipta 2

Stefan Vasilev 2

Yannick Versley 2

Mohammad Aliannejadi 1

Mehrdad Alizadeh 1

Fatemeh Azadi 1

Ahmad Babaeian Jelodar 1

Somayeh Bakshaei 1

Sergio Barrachina 1

Dominika Basaj 1

Mohaddeseh Bastan 1

Francisco Casacuberta 1

Patrycja Cieplicka 1

Moulay Zaidane Draidia 1

Sajad Ebrahimi 1

Dayne Freitag 1

Saeed Shiry Ghidary 1

Miguel Graça 1

Gholamreza (Reza) Haffari 1

Anas El Isbihani 1

Fattaneh Jabbari 1

Mahdi Khademian 1

Abdul Rafae Khan 1

Masoud Kiaeeha 1

Julia Kreutzer 1

Antonio-L. Lagarda 1

Kourosh Meshgi 1

Seyyed Mohammad Mohammadzadeh Ziabary 1

Nanyun (Violet) Peng 1

Jan-Thorsten Peter 1

Frithjof Petrick 1

Stefan Riezler 1

Hendrik Rosendahl 1

Nick Rossenbach 1

Reza Safabakhsh 1

Julian Schamper 1

Ehsan Shareghi 1

Mohammad Ebrahim Shiri Ahmad Abady 1

Amirhossein Tebbifakhr 1

Samira Tofighi Zahabi 1

Evgeniia Tokarchuk 1

Jesús Tomás 1

Inigo Urteaga 1

Enrique Vidal 1

Juan Miguel Vilar 1

Orshulevich Vladimir 1

Patrick Wilken 1

Venues