Vivek Sridhar

2025

CourtEval: A Courtroom-Based Multi-Agent Evaluation Framework
Sandeep Kumar | Abhijit A Nargund | Vivek Sridhar
Findings of the Association for Computational Linguistics: ACL 2025

Automated evaluation is crucial for assessing the quality of natural language text, especially in open-ended generation tasks, given the costly and time-consuming nature of human evaluation. Existing automatic evaluation metrics like ROUGE and BLEU often show low correlation with human judgments. As large language models (LLMs) continue to evolve, researchers have explored their use as alternatives to human evaluators. Although single-agent approaches have shown potential, results indicate that further progress is required to close the gap between their performance and the quality of human assessments. Acknowledging that human evaluations involve multiple annotators, the multi-agent approach allows LLMs to collaborate, enhancing efficiency and effectiveness in handling complex tasks. In this paper, we present CourtEval, a novel Multi-Agent Evaluation Framework modeled after courtroom dynamics. Each agent takes on a distinct role: the Grader, similar to a judge, assigns an initial score; the Critic, like a prosecutor, challenges this score; and the Defender, akin to a defense attorney, defends it. Based on the input from both the Critic and Defender, the Grader re-evaluates the score, leading to a more balanced and fair final decision through this adversarial process. CourtEval substantially outperforms the previous state-of-the-art methods in two meta-evaluation benchmarks in NLG evaluation, SummEval and TopicalChat.

2023

pdf bib abs

NLMs: Augmenting Negation in Language Models
Rituraj Singh | Rahul Kumar | Vivek Sridhar
Findings of the Association for Computational Linguistics: EMNLP 2023

Negation is the fundamental component in a natural language that reverses the semantic meaning of a sentence. It plays an extremely important role across a wide range of applications, yet they are underrepresented in pre-trained language models (LMs), resulting often in wrong inferences. In this work, we try to improve the underlying understanding of the negation in the pre-trained LMs. To augment negation understanding, we propose a language model objective with a weighted cross-entropy loss and elastic weight consolidation regularization. We reduce the mean top 1 error rate for BERT-base to 1.1%, BERT-large to 0.78%, RoBERTA-base to 3.74%, RoBERTA-large to 0.01% on the negated LAMA dataset. It minimizes the BERT error rate by a margin of 8% and also outperform the existing negation models. We also provide empirical evidences that negated augmented models outperform the classical models on original as well as negation benchmarks on natural language inference tasks.

2019

pdf bib abs

Robust Text Classification using Sub-Word Information in Input Word Representations.
Bhanu Prakash Mahanti | Priyank Chhipa | Vivek Sridhar | Vinuthkumar Prasan
Proceedings of the 16th International Conference on Natural Language Processing

Word based deep learning approaches have been used with increasing success recently to solve Natural Language Processing problems like Machine Translation, Language Modelling and Text Classification. However, performance of these word based models is limited by the vocabulary of the training corpus. Alternate approaches using character based models have been proposed to overcome the unseen word problems arising for a variety of reasons. However, character based models fail to capture the sequential relationship of words inherently present in texts. Hence, there is scope for improvement by addressing the unseen word problem while also maintaining the sequential context through word based models. In this work, we propose a method where the input embedding vector incorporates sub-word information but is also suitable for use with models which successfully capture the sequential nature of text. We further attempt to establish that using such a word representation as input makes the model robust to unseen words, particularly arising due to tokenization and spelling errors, which is a common problem in systems where a typing interface is one of the input modalities.

Co-authors

Vinuthkumar Prasan 1

Rituraj Singh 1

Venues

Findings2
ICON1

Fix author