Sourav Dutta


pdf bib
Aligned Weight Regularizers for Pruning Pretrained Neural Networks
James O’ Neill | Sourav Dutta | Haytham Assem
Findings of the Association for Computational Linguistics: ACL 2022

Pruning aims to reduce the number of parameters while maintaining performance close to the original network. This work proposes a novel self-distillation based pruning strategy, whereby the representational similarity between the pruned and unpruned versions of the same network is maximized. Unlike previous approaches that treat distillation and pruning separately, we use distillation to inform the pruning criteria, without requiring a separate student network as in knowledge distillation. We show that the proposed cross-correlation objective for self-distilled pruning implicitly encourages sparse solutions, naturally complementing magnitude-based pruning criteria. Experiments on the GLUE and XGLUE benchmarks show that self-distilled pruning increases mono- and cross-lingual language model performance. Self-distilled pruned models also outperform smaller Transformers with an equal number of parameters and are competitive against (6 times) larger distilled networks. We also observe that self-distillation (1) maximizes class separability, (2) increases the signal-to-noise ratio, and (3) converges faster after pruning steps, providing further insights into why self-distilled pruning improves generalization.

pdf bib
Multi-Stage Framework with Refinement Based Point Set Registration for Unsupervised Bi-Lingual Word Alignment
Silviu Vlad Oprea | Sourav Dutta | Haytham Assem
Proceedings of the 29th International Conference on Computational Linguistics

Cross-lingual alignment of word embeddings are important in knowledge transfer across languages, for improving machine translation and other multi-lingual applications. Current unsupervised approaches relying on learning structure-preserving transformations, using adversarial networks and refinement strategies, suffer from instability and convergence issues. This paper proposes BioSpere, a novel multi-stage framework for unsupervised mapping of bi-lingual word embeddings onto a shared vector space, by combining adversarial initialization, refinement procedure and point set registration. Experiments for parallel dictionary induction and word similarity demonstrate state-of-the-art unsupervised results for BioSpere on diverse languages – showcasing robustness against variable adversarial performance.


pdf bib
Cross-lingual Sentence Embedding using Multi-Task Learning
Koustava Goswami | Sourav Dutta | Haytham Assem | Theodorus Fransen | John P. McCrae
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Multilingual sentence embeddings capture rich semantic information not only for measuring similarity between texts but also for catering to a broad range of downstream cross-lingual NLP tasks. State-of-the-art multilingual sentence embedding models require large parallel corpora to learn efficiently, which confines the scope of these models. In this paper, we propose a novel sentence embedding framework based on an unsupervised loss function for generating effective multilingual sentence embeddings, eliminating the need for parallel corpora. We capture semantic similarity and relatedness between sentences using a multi-task loss function for training a dual encoder model mapping different languages onto the same vector space. We demonstrate the efficacy of an unsupervised as well as a weakly supervised variant of our framework on STS, BUCC and Tatoeba benchmark tasks. The proposed unsupervised sentence embedding framework outperforms even supervised state-of-the-art methods for certain under-resourced languages on the Tatoeba dataset and on a monolingual benchmark. Further, we show enhanced zero-shot learning capabilities for more than 30 languages, with the model being trained on only 13 languages. Our model can be extended to a wide range of languages from any language family, as it overcomes the requirement of parallel corpora for training.

pdf bib
EdinSaar@WMT21: North-Germanic Low-Resource Multilingual NMT
Svetlana Tchistiakova | Jesujoba Alabi | Koel Dutta Chowdhury | Sourav Dutta | Dana Ruiter
Proceedings of the Sixth Conference on Machine Translation

We describe the EdinSaar submission to the shared task of Multilingual Low-Resource Translation for North Germanic Languages at the Sixth Conference on Machine Translation (WMT2021). We submit multilingual translation models for translations to/from Icelandic (is), Norwegian-Bokmal (nb), and Swedish (sv). We employ various experimental approaches, including multilingual pre-training, back-translation, fine-tuning, and ensembling. In most translation directions, our models outperform other submitted systems.

pdf bib
DTAFA: Decoupled Training Architecture for Efficient FAQ Retrieval
Haytham Assem | Sourav Dutta | Edward Burgin
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Automated Frequently Asked Question (FAQ) retrieval provides an effective procedure to provide prompt responses to natural language based queries, providing an efficient platform for large-scale service-providing companies for presenting readily available information pertaining to customers’ questions. We propose DTAFA, a novel multi-lingual FAQ retrieval system that aims at improving the top-1 retrieval accuracy with the least number of parameters. We propose two decoupled deep learning architectures trained for (i) candidate generation via text classification for a user question, and (ii) learning fine-grained semantic similarity between user questions and the FAQ repository for candidate refinement. We validate our system using real-life enterprise data as well as open source dataset. Empirically we show that DTAFA achieves better accuracy compared to existing state-of-the-art while requiring nearly 30× lesser number of training parameters.


pdf bib
UdS-DFKI@WMT20: Unsupervised MT and Very Low Resource Supervised MT for German-Upper Sorbian
Sourav Dutta | Jesujoba Alabi | Saptarashmi Bandyopadhyay | Dana Ruiter | Josef van Genabith
Proceedings of the Fifth Conference on Machine Translation

This paper describes the UdS-DFKI submission to the shared task for unsupervised machine translation (MT) and very low-resource supervised MT between German (de) and Upper Sorbian (hsb) at the Fifth Conference of Machine Translation (WMT20). We submit systems for both the supervised and unsupervised tracks. Apart from various experimental approaches like bitext mining, model pre-training, and iterative back-translation, we employ a factored machine translation approach on a small BPE vocabulary.


pdf bib
C3EL: A Joint Model for Cross-Document Co-Reference Resolution and Entity Linking
Sourav Dutta | Gerhard Weikum
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Cross-Document Co-Reference Resolution using Sample-Based Clustering with Knowledge Enrichment
Sourav Dutta | Gerhard Weikum
Transactions of the Association for Computational Linguistics, Volume 3

Identifying and linking named entities across information sources is the basis of knowledge acquisition and at the heart of Web search, recommendations, and analytics. An important problem in this context is cross-document co-reference resolution (CCR): computing equivalence classes of textual mentions denoting the same entity, within and across documents. Prior methods employ ranking, clustering, or probabilistic graphical models using syntactic features and distant features from knowledge bases. However, these methods exhibit limitations regarding run-time and robustness. This paper presents the CROCS framework for unsupervised CCR, improving the state of the art in two ways. First, we extend the way knowledge bases are harnessed, by constructing a notion of semantic summaries for intra-document co-reference chains using co-occurring entity mentions belonging to different chains. Second, we reduce the computational cost by a new algorithm that embeds sample-based bisection, using spectral clustering or graph partitioning, in a hierarchical clustering process. This allows scaling up CCR to large corpora. Experiments with three datasets show significant gains in output quality, compared to the best prior methods, and the run-time efficiency of CROCS.