Talaat Khalil


2023

pdf bib
Improving Domain Robustness in Neural Machine Translation with Fused Topic Knowledge Embeddings
Danai Xezonaki | Talaat Khalil | David Stap | Brandon Denis
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

Domain robustness is a key challenge for Neural Machine Translation (NMT). Translating text from a different distribution than the training set requires the NMT models to generalize well to unseen domains. In this work we propose a novel way to address domain robustness, by fusing external topic knowledge into the NMT architecture. We employ a pretrained denoising autoencoder and fuse topic information into the system during continued pretraining, and finetuning of the model on the downstream NMT task. Our results show that incorporating external topic knowledge, as well as additional pretraining can improve the out-of-domain performance of NMT models. The proposed methodology meets state-of-the-art on out-of-domain performance. Our analysis shows that a low overlap between the pretraining and finetuning corpora, as well as the quality of topic representations help the NMT systems become more robust under domain shift.

2022

pdf bib
Empirical Evaluation of Language Agnostic Filtering of Parallel Data for Low Resource Languages
Praveen Dakwale | Talaat Khalil | Brandon Denis
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf bib
HuaAMS at SemEval-2022 Task 8: Combining Translation and Domain Pre-training for Cross-lingual News Article Similarity
Sai Sandeep Sharma Chittilla | Talaat Khalil
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes our submission to SemEval-2022 Multilingual News Article Similarity task. We experiment with different approaches that utilize a pre-trained language model fitted with a regression head to predict similarity scores for a given pair of news articles. Our best performing systems include 2 key steps: 1) pre-training with in-domain data 2) training data enrichment through machine translation. Our final submission is an ensemble of predictions from our top systems. While we show the significance of pre-training and augmentation, we believe the issue of language coverage calls for more attention.

2019

pdf bib
Cross-lingual intent classification in a low resource industrial setting
Talaat Khalil | Kornel Kiełczewski | Georgios Christos Chouliaras | Amina Keldibek | Maarten Versteegh
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper explores different approaches to multilingual intent classification in a low resource setting. Recent advances in multilingual text representations promise cross-lingual transfer for classifiers. We investigate the potential for this transfer in an applied industrial setting and compare to multilingual classification using machine translated text. Our results show that while the recently developed methods show promise, practical application calls for a combination of techniques for useful results.

2017

pdf bib
Toward a full-scale neural machine translation in production: the Booking.com use case
Pavel Levin | Nishikant Dhanuka | Talaat Khalil | Fedor Kovalev | Maxim Khalilov
Proceedings of Machine Translation Summit XVI: Commercial MT Users and Translators Track

2016

pdf bib
NileTMRG at SemEval-2016 Task 5: Deep Convolutional Neural Networks for Aspect Category and Sentiment Extraction
Talaat Khalil | Samhaa R. El-Beltagy
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)