2022
pdf
bib
abs
Summarization of Long Input Texts Using Multi-Layer Neural Network
Niladri Chatterjee
|
Aadyant Khatri
|
Raksha Agarwal
Proceedings of The Workshop on Automatic Summarization for Creative Writing
This paper describes the architecture of a novel Multi-Layer Long Text Summarizer (MLLTS) system proposed for the task of creative writing summarization. Typically, such writings are very long, often spanning over 100 pages. Summarizers available online are either not equipped enough to handle long texts, or even if they are able to generate the summary, the quality is poor. The proposed MLLTS system handles the difficulty by splitting the text into several parts. Each part is then subjected to different existing summarizers. A multilayer network is constructed by establishing linkages between the different parts. During training phases, several hyperparameters are fine-tuned. The system achieved very good ROUGE scores on the test data supplied for the contest.
2021
pdf
bib
abs
LangResearchLab NC at SemEval-2021 Task 1: Linguistic Feature Based Modelling for Lexical Complexity
Raksha Agarwal
|
Niladri Chatterjee
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
The present work aims at assigning a complexity score between 0 and 1 to a target word or phrase in a given sentence. For each Single Word Target, a Random Forest Regressor is trained on a feature set consisting of lexical, semantic, and syntactic information about the target. For each Multiword Target, a set of individual word features is taken along with single word complexities in the feature space. The system yielded the Pearson correlation of 0.7402 and 0.8244 on the test set for the Single and Multiword Targets, respectively.
pdf
bib
abs
NARNIA at NLP4IF-2021: Identification of Misinformation in COVID-19 Tweets Using BERTweet
Ankit Kumar
|
Naman Jhunjhunwala
|
Raksha Agarwal
|
Niladri Chatterjee
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
The spread of COVID-19 has been accompanied with widespread misinformation on social media. In particular, Twitterverse has seen a huge increase in dissemination of distorted facts and figures. The present work aims at identifying tweets regarding COVID-19 which contains harmful and false information. We have experimented with a number of Deep Learning-based models, including different word embeddings, such as Glove, ELMo, among others. BERTweet model achieved the best overall F1-score of 0.881 and secured the third rank on the above task.
pdf
bib
abs
LangResearchLab_NC at CMCL2021 Shared Task: Predicting Gaze Behaviour Using Linguistic Features and Tree Regressors
Raksha Agarwal
|
Niladri Chatterjee
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Analysis of gaze data behaviour has gained momentum in recent years for different NLP applications. The present paper aims at modelling gaze data behaviour of tokens in the context of a sentence. We have experimented with various Machine Learning Regression Algorithms on a feature space comprising the linguistic features of the target tokens for prediction of five Eye-Tracking features. CatBoost Regressor performed the best and achieved fourth position in terms of MAE based accuracy measurement for the ZuCo Dataset.
pdf
bib
abs
MTL782_IITD at CMCL 2021 Shared Task: Prediction of Eye-Tracking Features Using BERT Embeddings and Linguistic Features
Shivani Choudhary
|
Kushagri Tandon
|
Raksha Agarwal
|
Niladri Chatterjee
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Reading and comprehension are quintessentially cognitive tasks. Eye movement acts as a surrogate to understand which part of a sentence is critical to the process of comprehension. The aim of the shared task is to predict five eye-tracking features for a given word of the input sentence. We experimented with several models based on LGBM (Light Gradient Boosting Machine) Regression, ANN (Artificial Neural Network), and CNN (Convolutional Neural Network), using BERT embeddings and some combination of linguistic features. Our submission using CNN achieved an average MAE of 4.0639 and ranked 7th in the shared task. The average MAE was further lowered to 3.994 in post-task evaluation.
2020
pdf
bib
abs
LangResearchLab_NC at FinCausal 2020, Task 1: A Knowledge Induced Neural Net for Causality Detection
Raksha Agarwal
|
Ishaan Verma
|
Niladri Chatterjee
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation
Identifying causal relationships in a text is essential for achieving comprehensive natural language understanding. The present work proposes a combination of features derived from pre-trained BERT with linguistic features for training a supervised classifier for the task of Causality Detection. The Linguistic features help to inject knowledge about the semantic and syntactic structure of the input sentences. Experiments on the FinCausal Shared Task1 datasets indicate that the combination of Linguistic features with BERT improves overall performance for causality detection. The proposed system achieves a weighted average F1 score of 0.952 on the post-evaluation dataset.