Prabhanjan Kambadur


2024

pdf bib
Academics Can Contribute to Domain-Specialized Language Models
Mark Dredze | Genta Indra Winata | Prabhanjan Kambadur | Shijie Wu | Ozan Irsoy | Steven Lu | Vadim Dabravolski | David S Rosenberg | Sebastian Gehrmann
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Commercially available models dominate academic leaderboards. While impressive, this has concentrated research on creating and adapting general-purpose models to improve NLP leaderboard standings for large language models. However, leaderboards collect many individual tasks and general-purpose models often underperform in specialized domains; domain-specific or adapted models yield superior results. This focus on large general-purpose models excludes many academics and draws attention away from areas where they can make important contributions. We advocate for a renewed focus on developing and evaluating domain- and task-specific models, and highlight the unique role of academics in this endeavor.

2019

pdf bib
A Semi-Markov Structured Support Vector Machine Model for High-Precision Named Entity Recognition
Ravneet Arora | Chen-Tse Tsai | Ketevan Tsereteli | Prabhanjan Kambadur | Yi Yang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Named entity recognition (NER) is the backbone of many NLP solutions. F1 score, the harmonic mean of precision and recall, is often used to select/evaluate the best models. However, when precision needs to be prioritized over recall, a state-of-the-art model might not be the best choice. There is little in literature that directly addresses training-time modifications to achieve higher precision information extraction. In this paper, we propose a neural semi-Markov structured support vector machine model that controls the precision-recall trade-off by assigning weights to different types of errors in the loss-augmented inference during training. The semi-Markov property provides more accurate phrase-level predictions, thereby improving performance. We empirically demonstrate the advantage of our model when high precision is required by comparing against strong baselines based on CRF. In our experiments with the CoNLL 2003 dataset, our model achieves a better precision-recall trade-off at various precision levels.

2016

pdf bib
Geolocation for Twitter: Timing Matters
Mark Dredze | Miles Osborne | Prabhanjan Kambadur
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies