Haytham Assem


2022

pdf bib
Aligned Weight Regularizers for Pruning Pretrained Neural Networks
James O’ Neill | Sourav Dutta | Haytham Assem
Findings of the Association for Computational Linguistics: ACL 2022

Pruning aims to reduce the number of parameters while maintaining performance close to the original network. This work proposes a novel self-distillation based pruning strategy, whereby the representational similarity between the pruned and unpruned versions of the same network is maximized. Unlike previous approaches that treat distillation and pruning separately, we use distillation to inform the pruning criteria, without requiring a separate student network as in knowledge distillation. We show that the proposed cross-correlation objective for self-distilled pruning implicitly encourages sparse solutions, naturally complementing magnitude-based pruning criteria. Experiments on the GLUE and XGLUE benchmarks show that self-distilled pruning increases mono- and cross-lingual language model performance. Self-distilled pruned models also outperform smaller Transformers with an equal number of parameters and are competitive against (6 times) larger distilled networks. We also observe that self-distillation (1) maximizes class separability, (2) increases the signal-to-noise ratio, and (3) converges faster after pruning steps, providing further insights into why self-distilled pruning improves generalization.

2021

pdf bib
DTAFA: Decoupled Training Architecture for Efficient FAQ Retrieval
Haytham Assem | Sourav Dutta | Edward Burgin
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Automated Frequently Asked Question (FAQ) retrieval provides an effective procedure to provide prompt responses to natural language based queries, providing an efficient platform for large-scale service-providing companies for presenting readily available information pertaining to customers’ questions. We propose DTAFA, a novel multi-lingual FAQ retrieval system that aims at improving the top-1 retrieval accuracy with the least number of parameters. We propose two decoupled deep learning architectures trained for (i) candidate generation via text classification for a user question, and (ii) learning fine-grained semantic similarity between user questions and the FAQ repository for candidate refinement. We validate our system using real-life enterprise data as well as open source dataset. Empirically we show that DTAFA achieves better accuracy compared to existing state-of-the-art while requiring nearly 30× lesser number of training parameters.

pdf bib
Cross-lingual Sentence Embedding using Multi-Task Learning
Koustava Goswami | Sourav Dutta | Haytham Assem | Theodorus Fransen | John P. McCrae
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Multilingual sentence embeddings capture rich semantic information not only for measuring similarity between texts but also for catering to a broad range of downstream cross-lingual NLP tasks. State-of-the-art multilingual sentence embedding models require large parallel corpora to learn efficiently, which confines the scope of these models. In this paper, we propose a novel sentence embedding framework based on an unsupervised loss function for generating effective multilingual sentence embeddings, eliminating the need for parallel corpora. We capture semantic similarity and relatedness between sentences using a multi-task loss function for training a dual encoder model mapping different languages onto the same vector space. We demonstrate the efficacy of an unsupervised as well as a weakly supervised variant of our framework on STS, BUCC and Tatoeba benchmark tasks. The proposed unsupervised sentence embedding framework outperforms even supervised state-of-the-art methods for certain under-resourced languages on the Tatoeba dataset and on a monolingual benchmark. Further, we show enhanced zero-shot learning capabilities for more than 30 languages, with the model being trained on only 13 languages. Our model can be extended to a wide range of languages from any language family, as it overcomes the requirement of parallel corpora for training.