Peyman Passban


2021

pdf bib
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax
Ehsan Kamalloo | Mehdi Rezagholizadeh | Peyman Passban | Ali Ghodsi
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Revisiting Robust Neural Machine Translation: A Transformer Case Study
Peyman Passban | Puneeth Saladi | Qun Liu
Findings of the Association for Computational Linguistics: EMNLP 2021

Transformers have brought a remarkable improvement in the performance of neural machine translation (NMT) systems but they could be surprisingly vulnerable to noise. In this work, we try to investigate how noise breaks Transformers and if there exist solutions to deal with such issues. There is a large body of work in the NMT literature on analyzing the behavior of conventional models for the problem of noise but Transformers are relatively understudied in this context. Motivated by this, we introduce a novel data-driven technique called Target Augmented Fine-tuning (TAFT) to incorporate noise during training. This idea is comparable to the well-known fine-tuning strategy. Moreover, we propose two other novel extensions to the original Transformer: Controlled Denoising (CD) and Dual-Channel Decoding (DCD), that modify the neural architecture as well as the training process to handle noise. One important characteristic of our techniques is that they only impact the training phase and do not impose any overhead at inference time. We evaluated our techniques to translate the English–German pair in both directions and observed that our models have a higher tolerance to noise. More specifically, they perform with no deterioration where up to 10% of entire test words are infected by noise.

2020

pdf bib
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu | Peyman Passban | Mehdi Rezagholizadeh | Qun Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

With the growth of computing power neural machine translation (NMT) models also grow accordingly and become better. However, they also become harder to deploy on edge devices due to memory constraints. To cope with this problem, a common practice is to distill knowledge from a large and accurately-trained teacher network (T) into a compact student network (S). Although knowledge distillation (KD) is useful in most cases, our study shows that existing KD techniques might not be suitable enough for deep NMT engines, so we propose a novel alternative. In our model, besides matching T and S predictions we have a combinatorial mechanism to inject layer-level supervision from T to S. In this paper, we target low-resource settings and evaluate our translation engines for Portuguese→English, Turkish→English, and English→German directions. Students trained using our technique have 50% fewer parameters and can still deliver comparable results to those of 12-layer teachers.

2018

pdf bib
Improving Character-Based Decoding Using Target-Side Morphological Information for Neural Machine Translation
Peyman Passban | Qun Liu | Andy Way
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Recently, neural machine translation (NMT) has emerged as a powerful alternative to conventional statistical approaches. However, its performance drops considerably in the presence of morphologically rich languages (MRLs). Neural engines usually fail to tackle the large vocabulary and high out-of-vocabulary (OOV) word rate of MRLs. Therefore, it is not suitable to exploit existing word-based models to translate this set of languages. In this paper, we propose an extension to the state-of-the-art model of Chung et al. (2016), which works at the character level and boosts the decoder with target-side morphological information. In our architecture, an additional morphology table is plugged into the model. Each time the decoder samples from a target vocabulary, the table sends auxiliary signals from the most relevant affixes in order to enrich the decoder’s current state and constrain it to provide better predictions. We evaluated our model to translate English into German, Russian, and Turkish as three MRLs and observed significant improvements.

pdf bib
Tailoring Neural Architectures for Translating from Morphologically Rich Languages
Peyman Passban | Andy Way | Qun Liu
Proceedings of the 27th International Conference on Computational Linguistics

A morphologically complex word (MCW) is a hierarchical constituent with meaning-preserving subunits, so word-based models which rely on surface forms might not be powerful enough to translate such structures. When translating from morphologically rich languages (MRLs), a source word could be mapped to several words or even a full sentence on the target side, which means an MCW should not be treated as an atomic unit. In order to provide better translations for MRLs, we boost the existing neural machine translation (NMT) architecture with a double- channel encoder and a double-attentive decoder. The main goal targeted in this research is to provide richer information on the encoder side and redesign the decoder accordingly to benefit from such information. Our experimental results demonstrate that we could achieve our goal as the proposed model outperforms existing subword- and character-based architectures and showed significant improvements on translating from German, Russian, and Turkish into English.

2016

pdf bib
Improving Phrase-Based SMT Using Cross-Granularity Embedding Similarity
Peyman Passban | Chris Hokamp | Andy Way | Qun Liu
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
Enriching Phrase Tables for Statistical Machine Translation Using Mixed Embeddings
Peyman Passban | Qun Liu | Andy Way
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

The phrase table is considered to be the main bilingual resource for the phrase-based statistical machine translation (PBSMT) model. During translation, a source sentence is decomposed into several phrases. The best match of each source phrase is selected among several target-side counterparts within the phrase table, and processed by the decoder to generate a sentence-level translation. The best match is chosen according to several factors, including a set of bilingual features. PBSMT engines by default provide four probability scores in phrase tables which are considered as the main set of bilingual features. Our goal is to enrich that set of features, as a better feature set should yield better translations. We propose new scores generated by a Convolutional Neural Network (CNN) which indicate the semantic relatedness of phrase pairs. We evaluate our model in different experimental settings with different language pairs. We observe significant improvements when the proposed features are incorporated into the PBSMT pipeline.

2015

pdf bib
Benchmarking SMT Performance for Farsi Using the TEP++ Corpus
Peyman Passban | Andy Way | Qun Liu
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Benchmarking SMT Performance for Farsi Using the TEP++ Corpus
Peyman Passban | Andy Way | Qun Liu
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Bilingual distributed phrase representations for statistical machin translation
Peyman Passban | Chris Hokamp | Qun Li
Proceedings of Machine Translation Summit XV: Papers