Anant Gupta


2024

pdf bib
Leveraging Contextual Information for Effective Entity Salience Detection
Rajarshi Bhowmik | Marco Ponza | Atharva Tendle | Anant Gupta | Rebecca Jiang | Xingyu Lu | Qian Zhao | Daniel Preotiuc-Pietro
Findings of the Association for Computational Linguistics: NAACL 2024

In text documents such as news articles, the content and key events usually revolve around a subset of all the entities mentioned in a document. These entities, often deemed as salient entities, provide useful cues of the aboutness of a document to a reader. Identifying the salience of entities was found helpful in several downstream applications such as search, ranking, and entity-centric summarization, among others. Prior work on salient entity detection mainly focused on machine learning models that require heavy feature engineering. We show that fine-tuning medium-sized language models with a cross-encoder style architecture yields substantial performance gains over feature engineering approaches. To this end, we conduct a comprehensive benchmarking of four publicly available datasets using models representative of the medium-sized pre-trained language model family. Additionally, we show that zero-shot prompting of instruction-tuned language models yields inferior results, indicating the task’s uniqueness and complexity.

2023

pdf bib
Distillation of encoder-decoder transformers for sequence labelling
Marco Farina | Duccio Pappadopulo | Anant Gupta | Leslie Huang | Ozan Irsoy | Thamar Solorio
Findings of the Association for Computational Linguistics: EACL 2023

Driven by encouraging results on a wide range of tasks, the field of NLP is experiencing an accelerated race to develop bigger language models. This race for bigger models has also underscored the need to continue the pursuit of practical distillation approaches that can leverage the knowledge acquired by these big models in a compute-efficient manner. Having this goal in mind, we build on recent work to propose a hallucination-free framework for sequence tagging that is especially suited for distillation. We show empirical results of new state-of-the-art performance across multiple sequence labelling datasets and validate the usefulness of this framework for distilling a large model in a few-shot learning scenario.