Hany Hassan

Also published as: H. Hassan


2021

pdf bib
Improving Multilingual Translation by Representation and Gradient Regularization
Yilin Yang | Akiko Eriguchi | Alexandre Muzio | Prasad Tadepalli | Stefan Lee | Hany Hassan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Multilingual Neural Machine Translation (NMT) enables one model to serve all translation directions, including ones that are unseen during training, i.e. zero-shot translation. Despite being theoretically attractive, current models often produce low quality translations – commonly failing to even produce outputs in the right target language. In this work, we observe that off-target translation is dominant even in strong multilingual systems, trained on massive multilingual corpora. To address this issue, we propose a joint approach to regularize NMT models at both representation-level and gradient-level. At the representation level, we leverage an auxiliary target language prediction task to regularize decoder outputs to retain information about the target language. At the gradient level, we leverage a small amount of direct data (in thousands of sentence pairs) to regularize model gradients. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance by +5.59 and +10.38 BLEU on WMT and OPUS datasets respectively. Moreover, experiments show that our method also works well when the small amount of direct data is not available.

pdf bib
Discovering Representation Sprachbund For Multilingual Pre-Training
Yimin Fan | Yaobo Liang | Alexandre Muzio | Hany Hassan | Houqiang Li | Ming Zhou | Nan Duan
Findings of the Association for Computational Linguistics: EMNLP 2021

Multilingual pre-trained models have demonstrated their effectiveness in many multilingual NLP tasks and enabled zero-shot or few-shot transfer from high-resource languages to low-resource ones. However, due to significant typological differences and contradictions between some languages, such models usually perform poorly on many languages and cross-lingual settings, which shows the difficulty of learning a single model to handle massive diverse languages well at the same time. To alleviate this issue, we present a new multilingual pre-training pipeline. We propose to generate language representation from multilingual pre-trained model and conduct linguistic analysis to show that language representation similarity reflects linguistic similarity from multiple perspectives, including language family, geographical sprachbund, lexicostatistics, and syntax. Then we cluster all the target languages into multiple groups and name each group as a representation sprachbund. Thus, languages in the same representation sprachbund are supposed to boost each other in both pre-training and fine-tuning as they share rich linguistic similarity. We pre-train one multilingual model for each representation sprachbund. Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.

2020

pdf bib
Meta-Learning for Few-Shot NMT Adaptation
Amr Sharaf | Hany Hassan | Hal Daumé III
Proceedings of the Fourth Workshop on Neural Generation and Translation

We present META-MT, a meta-learning approach to adapt Neural Machine Translation (NMT) systems in a few-shot setting. META-MT provides a new approach to make NMT models easily adaptable to many target do- mains with the minimal amount of in-domain data. We frame the adaptation of NMT systems as a meta-learning problem, where we learn to adapt to new unseen domains based on simulated offline meta-training domain adaptation tasks. We evaluate the proposed meta-learning strategy on ten domains with general large scale NMT systems. We show that META-MT significantly outperforms classical domain adaptation when very few in- domain examples are available. Our experiments shows that META-MT can outperform classical fine-tuning by up to 2.5 BLEU points after seeing only 4, 000 translated words (300 parallel sentences).

pdf bib
Multi-task Learning for Multilingual Neural Machine Translation
Yiren Wang | ChengXiang Zhai | Hany Hassan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

While monolingual data has been shown to be useful in improving bilingual neural machine translation (NMT), effectively and efficiently leveraging monolingual data for Multilingual NMT (MNMT) systems is a less explored area. In this work, we propose a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data. We conduct extensive empirical studies on MNMT systems with 10 language pairs from WMT datasets. We show that the proposed approach can effectively improve the translation quality for both high-resource and low-resource languages with large margin, achieving significantly better results than the individual bilingual models. We also demonstrate the efficacy of the proposed approach in the zero-shot setup for language pairs without bitext training data. Furthermore, we show the effectiveness of MTL over pre-training approaches for both NMT and cross-lingual transfer learning NLU tasks; the proposed approach outperforms massive scale models trained on single task.

pdf bib
FastFormers: Highly Efficient Transformer Models for Natural Language Understanding
Young Jin Kim | Hany Hassan
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

Transformer-based models are the state-of-the-art for Natural Language Understanding (NLU) applications. Models are getting bigger and better on various tasks. However, Transformer models remain computationally challenging since they are not efficient at inference-time compared to traditional approaches. In this paper, we present FastFormers, a set of recipes to achieve efficient inference-time performance for Transformer-based models on various NLU tasks. We show how carefully utilizing knowledge distillation, structured pruning and numerical optimization can lead to drastic improvements on inference efficiency. We provide effective recipes that can guide practitioners to choose the best settings for various NLU tasks and pretrained models. Applying the proposed recipes to the SuperGLUE benchmark, we achieve from 9.8x up to 233.9x speed-up compared to out-of-the-box models on CPU. On GPU, we also achieve up to 12.4x speed-up with the presented methods. We show that FastFormers can drastically reduce cost of serving 100 million requests from 4,223 USD to just 18 USD on an Azure F16s_v2 instance. This translates to a sustainable runtime by reducing energy consumption 6.9x - 125.8x according to the metrics used in the SustaiNLP 2020 shared task.

2019

pdf bib
From Research to Production and Back: Ludicrously Fast Neural Machine Translation
Young Jin Kim | Marcin Junczys-Dowmunt | Hany Hassan | Alham Fikri Aji | Kenneth Heafield | Roman Grundkiewicz | Nikolay Bogoychev
Proceedings of the 3rd Workshop on Neural Generation and Translation

This paper describes the submissions of the “Marian” team to the WNGT 2019 efficiency shared task. Taking our dominating submissions to the previous edition of the shared task as a starting point, we develop improved teacher-student training via multi-agent dual-learning and noisy backward-forward translation for Transformer-based student models. For efficient CPU-based decoding, we propose pre-packed 8-bit matrix products, improved batched decoding, cache-friendly student architectures with parameter sharing and light-weight RNN-based decoder architectures. GPU-based decoding benefits from the same architecture changes, from pervasive 16-bit inference and concurrent streams. These modifications together with profiler-based C++ code optimization allow us to push the Pareto frontier established during the 2018 edition towards 24x (CPU) and 14x (GPU) faster models at comparable or higher BLEU values. Our fastest CPU model is more than 4x faster than last year’s fastest submission at more than 3 points higher BLEU. Our fastest GPU model at 1.5 seconds translation time is slightly faster than last year’s fastest RNN-based submissions, but outperforms them by more than 4 BLEU and 10 BLEU points respectively.

pdf bib
Selecting, Planning, and Rewriting: A Modular Approach for Data-to-Document Generation and Translation
Lesly Miculicich | Marc Marone | Hany Hassan
Proceedings of the 3rd Workshop on Neural Generation and Translation

In this paper, we report our system submissions to all 6 tracks of the WNGT 2019 shared task on Document-Level Generation and Translation. The objective is to generate a textual document from either structured data: generation task, or a document in a different language: translation task. For the translation task, we focused on adapting a large scale system trained on WMT data by fine tuning it on the RotoWire data. For the generation task, we participated with two systems based on a selection and planning model followed by (a) a simple language model generation, and (b) a GPT-2 pre-trained language model approach. The selection and planning module chooses a subset of table records in order, and the language models produce text given such a subset.

pdf bib
Multi-Source Cross-Lingual Model Transfer: Learning What to Share
Xilun Chen | Ahmed Hassan Awadallah | Hany Hassan | Wei Wang | Claire Cardie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Modern NLP applications have enjoyed a great boost utilizing neural networks models. Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks. Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages. In this work, we focus on the multilingual transfer setting where training data in multiple source languages is leveraged to further boost target language performance. Unlike most existing methods that rely only on language-invariant features for CLTL, our approach coherently utilizes both language-invariant and language-specific features at instance level. Our model leverages adversarial networks to learn language-invariant features, and mixture-of-experts models to dynamically exploit the similarity between the target language and each individual source language. This enables our model to learn effectively what to share between various languages in the multilingual setup. Moreover, when coupled with unsupervised multilingual embeddings, our model can operate in a zero-resource setting where neither target language training data nor cross-lingual resources are available. Our model achieves significant performance gains over prior art, as shown in an extensive set of experiments over multiple text classification and sequence tagging tasks including a large-scale industry dataset.

pdf bib
Morphology-aware Word-Segmentation in Dialectal Arabic Adaptation of Neural Machine Translation
Ahmed Tawfik | Mahitab Emam | Khaled Essam | Robert Nabil | Hany Hassan
Proceedings of the Fourth Arabic Natural Language Processing Workshop

Parallel corpora available for building machine translation (MT) models for dialectal Arabic (DA) are rather limited. The scarcity of resources has prompted the use of Modern Standard Arabic (MSA) abundant resources to complement the limited dialectal resource. However, dialectal clitics often differ between MSA and DA. This paper compares morphology-aware DA word segmentation to other word segmentation approaches like Byte Pair Encoding (BPE) and Sub-word Regularization (SR). A set of experiments conducted on Egyptian Arabic (EA), Levantine Arabic (LA), and Gulf Arabic (GA) show that a sufficiently accurate morphology-aware segmentation used in conjunction with BPE outperforms the other word segmentation approaches.

2018

pdf bib
Universal Neural Machine Translation for Extremely Low Resource Languages
Jiatao Gu | Hany Hassan | Jacob Devlin | Victor O.K. Li
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multi-lingual word-level sharing. The sentence-level sharing is represented by a model of experts from all source languages that share the source encoders with all other languages. This enables the low-resource language to utilize the lexical and sentence representations of the higher resource languages. Our approach is able to achieve 23 BLEU on Romanian-English WMT2016 using a tiny parallel corpus of 6k sentences, compared to the 18 BLEU of strong baseline system which uses multi-lingual training and back-translation. Furthermore, we show that the proposed approach can achieve almost 20 BLEU on the same dataset through fine-tuning a pre-trained multi-lingual system in a zero-shot setting.

2015

pdf bib
Learning Translation Models from Monolingual Continuous Representations
Kai Zhao | Hany Hassan | Michael Auli
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data
Avneesh Saluja | Hany Hassan | Kristina Toutanova | Chris Quirk
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Social Text Normalization using Contextual Graph Random Walks
Hany Hassan | Arul Menezes
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
MSR-FBK IWSLT 2013 SLT system description
Anthony Aue | Qin Gao | Hany Hassan | Xiaodong He | Gang Li | Nicholas Ruiz | Frank Seide
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the systems used for the MSR+FBK submission for the SLT track of IWSLT 2013. Starting from a baseline system we made a series of iterative and additive improvements, including a novel method for processing bilingual data used to train MT systems for use on ASR output. Our primary submission is a system combination of five individual systems, combining the output of multiple ASR engines with multiple MT techniques. There are two contrastive submissions to help place the combined system in context. We describe the systems used and present results on the test sets.

2009

pdf bib
Lexicalized Semi-incremental Dependency Parsing
Hany Hassan | Khalil Sima’an | Andy Way
Proceedings of the International Conference RANLP-2009

pdf bib
A Syntactified Direct Translation Model with Linear-time Decoding
Hany Hassan | Khalil Sima’an | Andy Way
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Language Independent Text Correction using Finite State Automata
Ahmed Hassan | Sara Noeman | Hany Hassan
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Exploiting alignment techniques in MATREX: the DCU machine translation system for IWSLT 2008.
Yanjun Ma | John Tinsley | Hany Hassan | Jinhua Du | Andy Way
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we give a description of the machine translation (MT) system developed at DCU that was used for our third participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2008). In this participation, we focus on various techniques for word and phrase alignment to improve system quality. Specifically, we try out our word packing and syntax-enhanced word alignment techniques for the Chinese–English task and for the English–Chinese task for the first time. For all translation tasks except Arabic–English, we exploit linguistically motivated bilingual phrase pairs extracted from parallel treebanks. We smooth our translation tables with out-of-domain word translations for the Arabic–English and Chinese–English tasks in order to solve the problem of the high number of out of vocabulary items. We also carried out experiments combining both in-domain and out-of-domain data to improve system performance and, finally, we deploy a majority voting procedure combining a language model-based method and a translation-based method for case and punctuation restoration. We participated in all the translation tasks and translated both the single-best ASR hypotheses and the correct recognition results. The translation results confirm that our new word and phrase alignment techniques are often helpful in improving translation quality, and the data combination method we proposed can significantly improve system performance.

2007

pdf bib
Arabic Cross-Document Person Name Normalization
Walid Magdy | Kareem Darwish | Ossama Emam | Hany Hassan
Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources

pdf bib
BioNoculars: Extracting Protein-Protein Interactions from Biomedical Text
Amgad Madkour | Kareem Darwish | Hany Hassan | Ahmed Hassan | Ossama Emam
Biological, translational, and clinical language processing

pdf bib
Supertagged Phrase-Based Statistical Machine Translation
Hany Hassan | Khalil Sima’an | Andy Way
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
MaTrEx: the DCU machine translation system for IWSLT 2007
Hany Hassan | Yanjun Ma | Andy Way
Proceedings of the Fourth International Workshop on Spoken Language Translation

In this paper, we give a description of the machine translation system developed at DCU that was used for our second participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2007). In this participation, we focus on some new methods to improve system quality. Specifically, we try our word packing technique for different language pairs, we smooth our translation tables with out-of-domain word translations for the Arabic–English and Chinese–English tasks in order to solve the high number of out of vocabulary items, and finally we deploy a translation-based model for case and punctuation restoration. We participated in both the classical and challenge tasks for the following translation directions: Chinese–English, Japanese–English and Arabic–English. For the last two tasks, we translated both the single-best ASR hypotheses and the correct recognition results; for Chinese–English, we just translated the correct recognition results. We report the results of the system for the provided evaluation sets, together with some additional experiments carried out following identification of some simple tokenisation errors in the official runs.

2006

pdf bib
Unsupervised Information Extraction Approach Using Graph Mutual Reinforcement
Hany Hassan | Ahmed Hassan | Ossama Emam
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Graph Based Semi-Supervised Approach for Information Extraction
Hany Hassan | Ahmed Hassan | Sara Noeman
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing

2005

pdf bib
Examining the Effect of Improved Context Sensitive Morphology on Arabic Information Retrieval
Kareem Darwish | Hany Hassan | Ossama Emam
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

pdf bib
An Integrated Approach for Arabic-English Named Entity Translation
Hany Hassan | Jeffrey Sorensen
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

2004

pdf bib
A Statistical Model for Multilingual Entity Detection and Tracking
R. Florian | H. Hassan | A. Ittycheriah | H. Jing | N. Kambhatla | X. Luo | N. Nicolov | S. Roukos
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

2003

pdf bib
TIPS: A Translingual Information Processing System
Yaser Al-Onaizan | Radu Florian | Martin Franz | Hany Hassan | Young-Suk Lee | J. Scott McCarley | Kishore Papineni | Salim Roukos | Jeffrey Sorensen | Christoph Tillmann | Todd Ward | Fei Xia
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations

pdf bib
Language Model Based Arabic Word Segmentation
Young-Suk Lee | Kishore Papineni | Salim Roukos | Ossama Emam | Hany Hassan
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics