Karthik Raman


2024

pdf bib
How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?
Kazuma Hashimoto | Iftekhar Naim | Karthik Raman
Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024)

Sequence labeling is a core task in text understanding for IE/IR systems. Text generation models have increasingly become the go-to solution for such tasks (e.g., entity extraction and dialog slot filling). While most research has focused on the labeling accuracy, a key aspect – of vital practical importance – has slipped through the cracks: understanding model confidence. More specifically, we lack a principled understanding of how to reliably gauge the confidence of a model in its predictions for each labeled span. This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling. Most notably, we find that simply using the decoder’s output probabilities is not the best in realizing well-calibrated confidence estimates. As verified over six public datasets of different tasks, we show that our proposed approach – which leverages statistics from top-k predictions by a beam search – significantly reduces calibration errors of the predictions of a generative sequence labeling model.

2022

pdf bib
Transforming Sequence Tagging Into A Seq2Seq Task
Karthik Raman | Iftekhar Naim | Jiecao Chen | Kazuma Hashimoto | Kiran Yalasangi | Krishna Srinivasan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Pretrained, large, generative language models (LMs) have had great success in a wide range of sequence tagging and structured prediction tasks. Casting a sequence tagging task as a Seq2Seq one requires deciding the formats of the input and output sequences. However, we lack a principled understanding of the trade-offs associated with these formats (such as the effect on model accuracy, sequence length, multilingual generalization, hallucination). In this paper, we rigorously study different formats one could use for casting input text sentences and their output labels into the input and target (i.e., output) of a Seq2Seq model. Along the way, we introduce a new format, which we show to to be both simpler and more effective. Additionally the new format demonstrates significant gains in the multilingual settings – both zero-shot transfer learning and joint training. Lastly, we find that the new format is more robust and almost completely devoid of hallucination – an issue we find common in existing formats. With well over a 1000 experiments studying 14 different formats, over 7 diverse public benchmarks – including 3 multilingual datasets spanning 7 languages – we believe our findings provide a strong empirical basis in understanding how we should tackle sequence tagging tasks.

pdf bib
QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation
Krishna Srinivasan | Karthik Raman | Anupam Samanta | Lingrui Liao | Luca Bertelli | Michael Bendersky
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

Large Language Models (LLMs) have shown impressive results on a variety of text understanding tasks. Search queries though pose a unique challenge, given their short-length and lack of nuance or context. Complicated feature engineering efforts do not always lead to downstream improvements as their performance benefits may be offset by increased complexity of knowledge distillation. Thus, in this paper we make the following contributions: (1) We demonstrate that Retrieval Augmentation of queries provides LLMs with valuable additional context enabling improved understanding. While Retrieval Augmentation typically increases latency of LMs (thus hurting distillation efficacy), (2) we provide a practical and effective way of distilling Retrieval Augmentation LLMs. Specifically, we use a novel two-stage distillation approach that allows us to carry over the gains of retrieval augmentation, without suffering the increased compute typically associated with it. (3) We demonstrate the benefits of the proposed approach (QUILL) on a billion-scale, real-world query understanding system resulting in huge gains. Via extensive experiments, including on public benchmarks, we believe this work offers a recipe for practical use of retrieval-augmented query understanding.

2020

pdf bib
DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling
Jiecao Chen | Liu Yang | Karthik Raman | Michael Bendersky | Jung-Jung Yeh | Yun Zhou | Marc Najork | Danyang Cai | Ehsan Emadzadeh
Findings of the Association for Computational Linguistics: EMNLP 2020

Pre-trained models like BERT ((Devlin et al., 2018) have dominated NLP / IR applications such as single sentence classification, text pair classification, and question answering. However, deploying these models in real systems is highly non-trivial due to their exorbitant computational costs. A common remedy to this is knowledge distillation (Hinton et al., 2015), leading to faster inference. However – as we show here – existing works are not optimized for dealing with pairs (or tuples) of texts. Consequently, they are either not scalable or demonstrate subpar performance. In this work, we propose DiPair — a novel framework for distilling fast and accurate models on text pair tasks. Coupled with an end-to-end training strategy, DiPair is both highly scalable and offers improved quality-speed tradeoffs. Empirical studies conducted on both academic and real-world e-commerce benchmarks demonstrate the efficacy of the proposed approach with speedups of over 350x and minimal quality drop relative to the cross-attention teacher BERT model.

2019

pdf bib
Learning Multilingual Word Embeddings Using Image-Text Data
Karan Singhal | Karthik Raman | Balder ten Cate
Proceedings of the Second Workshop on Shortcomings in Vision and Language

There has been significant interest recently in learning multilingual word embeddings – in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavailable for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of multilingual embeddings learned from weakly-supervised image-text data. In particular, we propose methods for learning multilingual embeddings using image-text data, by enforcing similarity between the representations of the image and that of the text. Our experiments reveal that even without using any expensive labeled data, a bag-of-words-based embedding model trained on image-text data achieves performance comparable to the state-of-the-art on crosslingual semantic similarity tasks.

2010

pdf bib
Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages
Manoj Kumar Chinnakotla | Karthik Raman | Pushpak Bhattacharyya
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics