Niladri Chatterjee

Also published as: N. Chatterjee

2024

Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science Communicators
Prasoon Bajpai | Niladri Chatterjee | Subhabrata Dutta | Tanmoy Chakraborty
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) and AI assistants driven by these models are experiencing exponential growth in usage among both expert and amateur users. In this work, we focus on evaluating the reliability of current LLMs as science communicators. Unlike existing benchmarks, our approach emphasizes assessing these models on scientific question-answering tasks that require a nuanced understanding and awareness of answerability. We introduce a novel dataset, SCiPS-QA, comprising 742 Yes/No queries embedded in complex scientific concepts, along with a benchmarking suite that evaluates LLMs for correctness and consistency across various criteria. We benchmark three proprietary LLMs from the OpenAI GPT family and 13 open-access LLMs from the Meta Llama-2, Llama-3, and Mistral families. While most open-access models significantly underperform compared to GPT-4 Turbo, our experiments identify Llama-3-70B as a strong competitor, often surpassing GPT-4 Turbo in various evaluation aspects. We also find that even the GPT models exhibit a general incompetence in reliably verifying LLM responses. Moreover, we observe an alarming trend where human evaluators are deceived by incorrect responses from GPT-4 Turbo.

2023

pdf bib abs

LRL_NC at SemEval-2023 Task 6: Sequential Sentence Classification for Legal Documents Using Topic Modeling Features
Kushagri Tandon | Niladri Chatterjee
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Natural Language Processing techniques can be leveraged to process legal proceedings for various downstream applications, such as sum- marization of a given judgement, prediction of the judgement for a given legal case, prece- dent search, among others. These applications will benefit from legal judgement documents already segmented into topically coherent units. The current task, namely, Rhetorical Role Pre- diction, aims at categorising each sentence in the sequence of sentences in a judgement document into different labels. The system proposed in this work combines topic mod- eling and RoBERTa to encode sentences in each document. A BiLSTM layer has been utilised to get contextualised sentence repre- sentations. The Rhetorical Role predictions for each sentence in each document are gen- erated by a final CRF layer of the proposed neuro-computing system. This system secured the rank 12 in the official task ranking, achiev- ing the micro-F1 score 0.7980. The code for the proposed systems has been made available at https://github.com/KushagriT/SemEval23_ LegalEval_TeamLRL_NC

pdf bib abs

IITD at SemEval-2023 Task 2: A Multi-Stage Information Retrieval Approach for Fine-Grained Named Entity Recognition
Shivani Choudhary | Niladri Chatterjee | Subir Saha
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

MultiCoNER-II is a fine-grained Named Entity Recognition (NER) task that aims to identify ambiguous and complex named entities in multiple languages, with a small amount of contextual information available. To address this task, we propose a multi-stage information retrieval (IR) pipeline that improves the performance of language models for fine-grained NER. Our approach involves leveraging a combination of a BM25-based IR model and a language model to retrieve relevant passages from a corpus. These passages are then used to train a model that utilizes a weighted average of losses. The prediction is generated by a decoder stack that includes a projection layer and conditional random field. To demonstrate the effectiveness of our approach, we participated in the English track of the MultiCoNER-II competition. Our approach yielded promising results, which we validated through detailed analysis.

pdf bib abs

LRL_NC at SemEval-2023 Task 4: The Touche23-George-boole Approach for Multi-Label Classification of Human-Values behind Arguments
Kushagri Tandon | Niladri Chatterjee
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The task ValueEval aims at assigning a sub- set of possible human value categories under- lying a given argument. Values behind argu- ments are often determinants to evaluate the relevance and importance of decisions in eth- ical sense, thereby making them essential for argument mining. The work presented here proposes two systems for the same. Both sys- tems use RoBERTa to encode sentences in each document. System1 makes use of features ob- tained from training models for two auxiliary tasks, whereas System2 combines RoBERTa with topic modeling to get sentence represen- tation. These features are used by a classifi- cation head to generate predictions. System1 secured the rank 22 in the official task rank- ing, achieving the macro F1-score 0.46 on the main dataset. System2 was not a part of official evaluation. Subsequent experiments achieved highest (among the proposed systems) macro F1-scores of 0.48 (System2), 0.31 (ablation on System1) and 0.33 (ablation on System1) on the main dataset, the Nahj al-Balagha dataset, and the New York Times dataset.

2022

pdf bib abs

Summarization of Long Input Texts Using Multi-Layer Neural Network
Niladri Chatterjee | Aadyant Khatri | Raksha Agarwal
Proceedings of the Workshop on Automatic Summarization for Creative Writing

This paper describes the architecture of a novel Multi-Layer Long Text Summarizer (MLLTS) system proposed for the task of creative writing summarization. Typically, such writings are very long, often spanning over 100 pages. Summarizers available online are either not equipped enough to handle long texts, or even if they are able to generate the summary, the quality is poor. The proposed MLLTS system handles the difficulty by splitting the text into several parts. Each part is then subjected to different existing summarizers. A multilayer network is constructed by establishing linkages between the different parts. During training phases, several hyperparameters are fine-tuned. The system achieved very good ROUGE scores on the test data supplied for the contest.

pdf bib abs

Team LRL_NC at SemEval-2022 Task 4: Binary and Multi-label Classification of PCL using Fine-tuned Transformer-based Models
Kushagri Tandon | Niladri Chatterjee
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Patronizing and condescending language (PCL) can find its way into many mediums of public discourse. Presence of PCL in text can produce negative effects in the society. The challenge presented by the task emerges from the subtleties of PCL and various data dependent constraints. Hence, developing techniques to detect PCL in text, before it is propagated is vital. The aim of this paper is twofold, a) to present systems that can be used to classify a text as containing PCL or not, and b) to present systems that assign the different categories of PCL present in text. The proposed systems are primarily rooted in transformer-based pre-trained language models. Among the models submitted for Subtask 1, the best F1-Score of 0.5436 was achieved by a deep learning based ensemble model. This system secured the rank 29 in the official task ranking. For Subtask 2, the best macro-average F1-Score of 0.339 was achieved by an ensemble model combining transformer-based neural architecture with gradient boosting label-balanced classifiers. This system secured the rank 21 in the official task ranking. Among subsequently carried out experiments a variation in architecture of a system for Subtask 2 achieved a macro-average F1-Score of 0.3527.

2021

pdf bib abs

MTL782_IITD at CMCL 2021 Shared Task: Prediction of Eye-Tracking Features Using BERT Embeddings and Linguistic Features
Shivani Choudhary | Kushagri Tandon | Raksha Agarwal | Niladri Chatterjee
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Reading and comprehension are quintessentially cognitive tasks. Eye movement acts as a surrogate to understand which part of a sentence is critical to the process of comprehension. The aim of the shared task is to predict five eye-tracking features for a given word of the input sentence. We experimented with several models based on LGBM (Light Gradient Boosting Machine) Regression, ANN (Artificial Neural Network), and CNN (Convolutional Neural Network), using BERT embeddings and some combination of linguistic features. Our submission using CNN achieved an average MAE of 4.0639 and ranked 7th in the shared task. The average MAE was further lowered to 3.994 in post-task evaluation.

pdf bib abs

LangResearchLab NC at SemEval-2021 Task 1: Linguistic Feature Based Modelling for Lexical Complexity
Raksha Agarwal | Niladri Chatterjee
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

The present work aims at assigning a complexity score between 0 and 1 to a target word or phrase in a given sentence. For each Single Word Target, a Random Forest Regressor is trained on a feature set consisting of lexical, semantic, and syntactic information about the target. For each Multiword Target, a set of individual word features is taken along with single word complexities in the feature space. The system yielded the Pearson correlation of 0.7402 and 0.8244 on the test set for the Single and Multiword Targets, respectively.

pdf bib abs

LangResearchLab_NC at CMCL2021 Shared Task: Predicting Gaze Behaviour Using Linguistic Features and Tree Regressors
Raksha Agarwal | Niladri Chatterjee
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Analysis of gaze data behaviour has gained momentum in recent years for different NLP applications. The present paper aims at modelling gaze data behaviour of tokens in the context of a sentence. We have experimented with various Machine Learning Regression Algorithms on a feature space comprising the linguistic features of the target tokens for prediction of five Eye-Tracking features. CatBoost Regressor performed the best and achieved fourth position in terms of MAE based accuracy measurement for the ZuCo Dataset.

pdf bib abs

NARNIA at NLP4IF-2021: Identification of Misinformation in COVID-19 Tweets Using BERTweet
Ankit Kumar | Naman Jhunjhunwala | Raksha Agarwal | Niladri Chatterjee
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

The spread of COVID-19 has been accompanied with widespread misinformation on social media. In particular, Twitterverse has seen a huge increase in dissemination of distorted facts and figures. The present work aims at identifying tweets regarding COVID-19 which contains harmful and false information. We have experimented with a number of Deep Learning-based models, including different word embeddings, such as Glove, ELMo, among others. BERTweet model achieved the best overall F1-score of 0.881 and secured the third rank on the above task.

2020

pdf bib abs

LangResearchLab_NC at FinCausal 2020, Task 1: A Knowledge Induced Neural Net for Causality Detection
Raksha Agarwal | Ishaan Verma | Niladri Chatterjee
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

Identifying causal relationships in a text is essential for achieving comprehensive natural language understanding. The present work proposes a combination of features derived from pre-trained BERT with linguistic features for training a supervised classifier for the task of Causality Detection. The Linguistic features help to inject knowledge about the semantic and syntactic structure of the input sentences. Experiments on the FinCausal Shared Task1 datasets indicate that the combination of Linguistic features with BERT improves overall performance for causality detection. The proposed system achieves a weighted average F1 score of 0.952 on the post-evaluation dataset.

Parsing Aligned Parallel Corpus by Projecting Syntactic Relations from Annotated Source Corpus
Shailly Goyal | Niladri Chatterjee
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2003

pdf bib abs

Identification of divergence for English to Hindi EBMT
Deepa Gupta | Niladri Chatterjee
Proceedings of Machine Translation Summit IX: Papers

Divergence is a key aspect of translation between two languages. Divergence occurs when structurally similar sentences of the source language do not translate into sentences that are similar in structures in the target language. Divergence assumes special significance in the domain of Example-Based Machine Translation (EBMT). An EBMT system generates translation of a given sentence by retrieving similar past translation examples from its example base and then adapting them suitably to meet the current translation requirements. Divergence imposes a great challenge to the success of EBMT. The present work provides a technique for identification of divergence without going into the semantic details of the underlying sentences. This identification helps in partitioning the example database into divergence / non-divergence categories, which in turn should facilitate efficient retrieval and adaptation in an EBMT system.