Sandeep Mathias

2025

PresiUniv at FinCausal 2025 Shared Task: Applying Fine-tuned Language Models to Explain Financial Cause and Effect with Zero-shot Learning
Medha Jeenoor | Madiha Aziz | Saipriya Dipika Vaidyanathan | Avijit Samantraya | Sandeep Mathias
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

Transformer-based multilingual question-answering models are used to detect causality in financial text data. This study employs BERT (CITATION) for English text and XLM-RoBERTa (CITATION) for Spanish data, which were fine-tuned on the SQuAD datasets (CITATION) (CITATION). These pre-trained models are used to extract answers to the targeted questions. We design a system using these pre-trained models to answer questions, based on the given context. The results validate the effectiveness of the systems in understanding nuanced financial language and offers a tool for multi-lingual text analysis. Our system is able to achieve SAS scores of 0.75 in Spanish and 0.82 in English.

2024

pdf bib abs

ISEP_Presidency_University at MLSP 2024 Shared Task: Using GPT-3.5 to Generate Substitutes for Lexical Simplification
Benjamin Dutilleul | Mathis Debaillon | Sandeep Mathias
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Lexical substitute generation is a task where we generate substitutes for a given word to fit in the required context. It is one of the main steps for automatic lexical simplifcation. In this paper, we introduce an automatic lexical simplification system using the GPT-3 large language model. The system generates simplified candidate substitutions for complex words to aid readability and comprehension for the reader. The paper describes the system that we submitted for the Multilingual Lexical Simplification Pipeline Shared Task at the 2024 BEA Workshop. During the shared task, we experimented with Catalan, English, French, Italian, Portuguese, and German for the Lexical Simplification Shared Task. We achieved the best results in Catalan and Portuguese, and were runners-up in English, French and Italian. To further research in this domain, we also release our code upon acceptance of the paper.

2022

pdf bib abs

Many Hands Make Light Work: Using Essay Traits to Automatically Score Essays
Rahul Kumar | Sandeep Mathias | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Most research in the area of automatic essay grading (AEG) is geared towards scoring the essay holistically while there has also been little work done on scoring individual essay traits. In this paper, we describe a way to score essays using a multi-task learning (MTL) approach, where scoring the essay holistically is the primary task, and scoring the essay traits is the auxiliary task. We compare our results with a single-task learning (STL) approach, using both LSTMs and BiLSTMs. To find out which traits work best for different types of essays, we conduct ablation tests for each of the essay traits. We also report the runtime and number of training parameters for each system. We find that MTL-based BiLSTM system gives the best results for scoring the essay holistically, as well as performing well on scoring the essay traits. The MTL systems also give a speed-up of between 2.30 to 3.70 times the speed of the STL system, when it comes to scoring the essay and all the traits.

pdf bib abs

PresiUniv at TSAR-2022 Shared Task: Generation and Ranking of Simplification Substitutes of Complex Words in Multiple Languages
Peniel Whistely | Sandeep Mathias | Galiveeti Poornima
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

In this paper, we describe our approach to generate and rank candidate simplifications using pre-trained language models (Eg. BERT), publicly available word embeddings (Eg. FastText), and a part-of-speech tagger, to generate and rank candidate contextual simplifications for a given complex word. In this task, our system, PresiUniv, was placed first in the Spanish track, 5th in the Brazilian-Portuguese track, and 10th in the English track. We upload our codes and data for this project to aid in replication of our results. We also analyze some of the errors and describe design decisions which we took while writing the paper.

2020

pdf bib abs

Happy Are Those Who Grade without Seeing: A Multi-Task Learning Approach to Grade Essays Using Gaze Behaviour
Sandeep Mathias | Rudra Murthy | Diptesh Kanojia | Abhijit Mishra | Pushpak Bhattacharyya
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

The gaze behaviour of a reader is helpful in solving several NLP tasks such as automatic essay grading. However, collecting gaze behaviour from readers is costly in terms of time and money. In this paper, we propose a way to improve automatic essay grading using gaze behaviour, which is learnt at run time using a multi-task learning framework. To demonstrate the efficacy of this multi-task learning based approach to automatic essay grading, we collect gaze behaviour for 48 essays across 4 essay sets, and learn gaze behaviour for the rest of the essays, numbering over 7000 essays. Using the learnt gaze behaviour, we can achieve a statistically significant improvement in performance over the state-of-the-art system for the essay sets where we have gaze data. We also achieve a statistically significant improvement for 4 other essay sets, numbering about 6000 essays, where we have no gaze behaviour data available. Our approach establishes that learning gaze behaviour improves automatic essay grading.

pdf bib abs

Can Neural Networks Automatically Score Essay Traits?
Sandeep Mathias | Pushpak Bhattacharyya
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

Essay traits are attributes of an essay that can help explain how well written (or badly written) the essay is. Examples of traits include Content, Organization, Language, Sentence Fluency, Word Choice, etc. A lot of research in the last decade has dealt with automatic holistic essay scoring - where a machine rates an essay and gives a score for the essay. However, writers need feedback, especially if they want to improve their writing - which is why trait-scoring is important. In this paper, we show how a deep-learning based system can outperform feature-based machine learning systems, as well as a string kernel system in scoring essay traits.

pdf bib abs

Cognitively Aided Zero-Shot Automatic Essay Grading
Sandeep Mathias | Rudra Murthy | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Automatic essay grading (AEG) is a process in which machines assign a grade to an essay written in response to a topic, called the prompt. Zero-shot AEG is when we train a system to grade essays written to a new prompt which was not present in our training data. In this paper, we describe a solution to the problem of zero-shot automatic essay grading, using cognitive information, in the form of gaze behaviour. Our experiments show that using gaze behaviour helps in improving the performance of AEG systems, especially when we provide a new essay written in response to a new prompt for scoring, by an average of almost 5 percentage points of QWK.

2018

pdf bib

ASAP++: Enriching the ASAP Automated Essay Grading Dataset with Essay Attribute Scores
Sandeep Mathias | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs

Eyes are the Windows to the Soul: Predicting the Rating of Text Quality Using Gaze Behaviour
Sandeep Mathias | Diptesh Kanojia | Kevin Patel | Samarth Agrawal | Abhijit Mishra | Pushpak Bhattacharyya
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Predicting a reader’s rating of text quality is a challenging task that involves estimating different subjective aspects of the text, like structure, clarity, etc. Such subjective aspects are better handled using cognitive information. One such source of cognitive information is gaze behaviour. In this paper, we show that gaze behaviour does indeed help in effectively predicting the rating of text quality. To do this, we first we model text quality as a function of three properties - organization, coherence and cohesion. Then, we demonstrate how capturing gaze behaviour helps in predicting each of these properties, and hence the overall quality, by reporting improvements obtained by adding gaze features to traditional textual features for score prediction. We also hypothesize that if a reader has fully understood the text, the corresponding gaze behaviour would give a better indication of the assigned rating, as opposed to partial understanding. Our experiments validate this hypothesis by showing greater agreement between the given rating and the predicted rating when the reader has a full understanding of the text.

pdf bib abs

The Whole is Greater than the Sum of its Parts: Towards the Effectiveness of Voting Ensemble Classifiers for Complex Word Identification
Nikhil Wani | Sandeep Mathias | Jayashree Aanand Gajjam | Pushpak Bhattacharyya
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper, we present an effective system using voting ensemble classifiers to detect contextually complex words for non-native English speakers. To make the final decision, we channel a set of eight calibrated classifiers based on lexical, size and vocabulary features and train our model with annotated datasets collected from a mixture of native and non-native speakers. Thereafter, we test our system on three datasets namely News, WikiNews, and Wikipedia and report competitive results with an F1-Score ranging between 0.777 to 0.855 for each of the datasets. Our system outperforms multiple other models and falls within 0.042 to 0.026 percent of the best-performing model’s score in the shared task.

pdf bib abs

Thank “Goodness”! A Way to Measure Style in Student Essays
Sandeep Mathias | Pushpak Bhattacharyya
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

Essays have two major components for scoring - content and style. In this paper, we describe a property of the essay, called goodness, and use it to predict the score given for the style of student essays. We compare our approach to solve this problem with baseline approaches, like language modeling and also a state-of-the-art deep learning system. We show that, despite being quite intuitive, our approach is very powerful in predicting the style of the essays.