Swapnil Hingmire
2025
Objectifying the Subjective: Cognitive Biases in Topic Interpretations
Swapnil Hingmire | Ze Shi Li | Shiyu (Vivienne) Zeng | Ahmed Musa Awon | Luiz Franciscatto Guerra | Neil Ernst
Transactions of the Association for Computational Linguistics, Volume 13
Swapnil Hingmire | Ze Shi Li | Shiyu (Vivienne) Zeng | Ahmed Musa Awon | Luiz Franciscatto Guerra | Neil Ernst
Transactions of the Association for Computational Linguistics, Volume 13
Interpretation of topics is crucial for their downstream applications. State-of-the-art evaluation measures of topic quality such as coherence and word intrusion do not measure how much a topic facilitates the exploration of a corpus. To design evaluation measures grounded on a task, and a population of users, we do user studies to understand how users interpret topics. We propose constructs of topic quality and ask users to assess them in the context of a topic and provide rationale behind evaluations. We use reflexive thematic analysis to identify themes of topic interpretations from rationales. Users interpret topics based on availability and representativeness heuristics rather than probability. We propose a theory of topic interpretation based on the anchoring-and-adjustment heuristic: users anchor on salient words and make semantic adjustments to arrive at an interpretation. Topic interpretation can be viewed as making a judgment under uncertainty by an ecologically rational user, and hence cognitive biases aware user models and evaluation frameworks are needed.
DoDS-IITPKD:Submissions to the WMT25 Low-Resource Indic Language Translation Task
Ontiwell Khongthaw | G.l. Salvin | Shrikant Budde | Abigairl Chigwededza | Dhruvadeep Malkar | Swapnil Hingmire
Proceedings of the Tenth Conference on Machine Translation
Ontiwell Khongthaw | G.l. Salvin | Shrikant Budde | Abigairl Chigwededza | Dhruvadeep Malkar | Swapnil Hingmire
Proceedings of the Tenth Conference on Machine Translation
Low-resource translation for Indic languages poses significant challenges due to limited parallel corpora and linguistic diversity. In this work, we describe our participation in the WMT 2025 shared task for four Indic languages-Khasi, Mizo, Assamese, which is categorized into Category 1 and Bodo in Cate- gory 2. For our PRIMARY submission, we fine- tuned the distilled NLLB-200 model on bidi- rectional English↔Khasi and English↔Mizo data, and employed the IndicTrans2 model family for Assamese and Bodo translation. Our CONTRASTIVE submission augments training with external corpora from PMIN- DIA and Google SMOL to further enrich low- resource data coverage. Both systems lever- age Low-Rank Adaptation (LoRA) within a parameter-efficient fine-tuning framework, en- abling lightweight adapter training atop frozen pretrained weights. The translation pipeline was developed using the Hugging Face Trans- formers and PEFT libraries, augmented with bespoke preprocessing modules that append both language and domain identifiers to each instance. We evaluated our approach on par- allel corpora spanning multiple domains- ar- ticle based, newswire, scientific, and biblical texts as provided by the WMT25 dataset, under conditions of severe data scarcity. Fine-tuning lightweight LoRA adapters on targeted parallel corpora yields marked improvements in evalua- tion metrics, confirming their effectiveness for cross-domain adaptation in low-resource Indic languages.
2023
A Transfer Learning Pipeline for Educational Resource Discovery with Application in Survey Generation
Irene Li | Thomas George | Alex Fabbri | Tammy Liao | Benjamin Chen | Rina Kawamura | Richard Zhou | Vanessa Yan | Swapnil Hingmire | Dragomir Radev
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Irene Li | Thomas George | Alex Fabbri | Tammy Liao | Benjamin Chen | Rina Kawamura | Richard Zhou | Vanessa Yan | Swapnil Hingmire | Dragomir Radev
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Effective human learning depends on a wide selection of educational materials that align with the learner’s current understanding of the topic. While the Internet has revolutionized human learning or education, a substantial resource accessibility barrier still exists. Namely, the excess of online information can make it challenging to navigate and discover high-quality learning materials in a given subject area. In this paper, we propose an automatic pipeline for building an educational resource discovery system for new domains. The pipeline consists of three main steps: resource searching, feature extraction, and resource classification. We first collect frequent queries from a set of seed documents, and search the web with these queries to obtain candidate resources such as lecture slides and introductory blog posts. Then, we process these resources for BERT-based features and meta-features. Next, we train a tree-based classifier to decide whether they are suitable learning materials. The pipeline achieves F1 scores of 0.94 and 0.82 when evaluated on two similar but novel domains. Finally, we demonstrate how this pipeline can benefit two applications: prerequisite chain learning and leading paragraph generation for surveys. We also release a corpus of 39,728 manually labeled web resources and 659 queries from NLP, Computer Vision (CV), and Statistics (STATS).
2021
Extracting Events from Industrial Incident Reports
Nitin Ramrakhiyani | Swapnil Hingmire | Sangameshwar Patil | Alok Kumar | Girish Palshikar
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)
Nitin Ramrakhiyani | Swapnil Hingmire | Sangameshwar Patil | Alok Kumar | Girish Palshikar
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)
Incidents in industries have huge social and political impact and minimizing the consequent damage has been a high priority. However, automated analysis of repositories of incident reports has remained a challenge. In this paper, we focus on automatically extracting events from incident reports. Due to absence of event annotated datasets for industrial incidents we employ a transfer learning based approach which is shown to outperform several baselines. We further provide detailed analysis regarding effect of increase in pre-training data and provide explainability of why pre-training improves the performance.
2020
R-VGAE: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning
Irene Li | Alexander Fabbri | Swapnil Hingmire | Dragomir Radev
Proceedings of the 28th International Conference on Computational Linguistics
Irene Li | Alexander Fabbri | Swapnil Hingmire | Dragomir Radev
Proceedings of the 28th International Conference on Computational Linguistics
The task of concept prerequisite chain learning is to automatically determine the existence of prerequisite relationships among concept pairs. In this paper, we frame learning prerequisite relationships among concepts as an unsupervised task with no access to labeled concept pairs during training. We propose a model called the Relational-Variational Graph AutoEncoder (R-VGAE) to predict concept relations within a graph consisting of concept and resource nodes. Results show that our unsupervised approach outperforms graph-based semi-supervised methods and other baseline methods by up to 9.77% and 10.47% in terms of prerequisite relation prediction accuracy and F1 score. Our method is notably the first graph-based model that attempts to make use of deep learning representations for the task of unsupervised prerequisite learning. We also expand an existing corpus which totals 1,717 English Natural Language Processing (NLP)-related lecture slide files and manual concept pair annotations over 322 topics.
Extracting Message Sequence Charts from Hindi Narrative Text
Swapnil Hingmire | Nitin Ramrakhiyani | Avinash Kumar Singh | Sangameshwar Patil | Girish Palshikar | Pushpak Bhattacharyya | Vasudeva Varma
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events
Swapnil Hingmire | Nitin Ramrakhiyani | Avinash Kumar Singh | Sangameshwar Patil | Girish Palshikar | Pushpak Bhattacharyya | Vasudeva Varma
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events
In this paper, we propose the use of Message Sequence Charts (MSC) as a representation for visualizing narrative text in Hindi. An MSC is a formal representation allowing the depiction of actors and interactions among these actors in a scenario, apart from supporting a rich framework for formal inference. We propose an approach to extract MSC actors and interactions from a Hindi narrative. As a part of the approach, we enrich an existing event annotation scheme where we provide guidelines for annotation of the mood of events (realis vs irrealis) and guidelines for annotation of event arguments. We report performance on multiple evaluation criteria by experimenting with Hindi narratives from Indian History. Though Hindi is the fourth most-spoken first language in the world, from the NLP perspective it has comparatively lesser resources than English. Moreover, there is relatively less work in the context of event processing in Hindi. Hence, we believe that this work is among the initial works for Hindi event processing.
2019
Extraction of Message Sequence Charts from Software Use-Case Descriptions
Girish Palshikar | Nitin Ramrakhiyani | Sangameshwar Patil | Sachin Pawar | Swapnil Hingmire | Vasudeva Varma | Pushpak Bhattacharyya
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)
Girish Palshikar | Nitin Ramrakhiyani | Sangameshwar Patil | Sachin Pawar | Swapnil Hingmire | Vasudeva Varma | Pushpak Bhattacharyya
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)
Software Requirement Specification documents provide natural language descriptions of the core functional requirements as a set of use-cases. Essentially, each use-case contains a set of actors and sequences of steps describing the interactions among them. Goals of use-case reviews and analyses include their correctness, completeness, detection of ambiguities, prototyping, verification, test case generation and traceability. Message Sequence Chart (MSC) have been proposed as a expressive, rigorous yet intuitive visual representation of use-cases. In this paper, we describe a linguistic knowledge-based approach to extract MSCs from use-cases. Compared to existing techniques, we extract richer constructs of the MSC notation such as timers, conditions and alt-boxes. We apply this tool to extract MSCs from several real-life software use-case descriptions and show that it performs better than the existing techniques. We also discuss the benefits and limitations of the extracted MSCs to meet the above goals.
Extraction of Message Sequence Charts from Narrative History Text
Girish Palshikar | Sachin Pawar | Sangameshwar Patil | Swapnil Hingmire | Nitin Ramrakhiyani | Harsimran Bedi | Pushpak Bhattacharyya | Vasudeva Varma
Proceedings of the First Workshop on Narrative Understanding
Girish Palshikar | Sachin Pawar | Sangameshwar Patil | Swapnil Hingmire | Nitin Ramrakhiyani | Harsimran Bedi | Pushpak Bhattacharyya | Vasudeva Varma
Proceedings of the First Workshop on Narrative Understanding
In this paper, we advocate the use of Message Sequence Chart (MSC) as a knowledge representation to capture and visualize multi-actor interactions and their temporal ordering. We propose algorithms to automatically extract an MSC from a history narrative. For a given narrative, we first identify verbs which indicate interactions and then use dependency parsing and Semantic Role Labelling based approaches to identify senders (initiating actors) and receivers (other actors involved) for these interaction verbs. As a final step in MSC extraction, we employ a state-of-the art algorithm to temporally re-order these interactions. Our evaluation on multiple publicly available narratives shows improvements over four baselines.
2018
Resolving Actor Coreferences in Hindi Narrative Text
Nitin Ramrakhiyani | Swapnil Hingmire | Sachin Pawar | Sangameshwar Patil | Girish K. Palshikar | Pushpak Bhattacharyya | Vasudeva Verma
Proceedings of the 15th International Conference on Natural Language Processing
Nitin Ramrakhiyani | Swapnil Hingmire | Sachin Pawar | Sangameshwar Patil | Girish K. Palshikar | Pushpak Bhattacharyya | Vasudeva Verma
Proceedings of the 15th International Conference on Natural Language Processing
Identification of Alias Links among Participants in Narratives
Sangameshwar Patil | Sachin Pawar | Swapnil Hingmire | Girish Palshikar | Vasudeva Varma | Pushpak Bhattacharyya
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Sangameshwar Patil | Sachin Pawar | Swapnil Hingmire | Girish Palshikar | Vasudeva Varma | Pushpak Bhattacharyya
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Identification of distinct and independent participants (entities of interest) in a narrative is an important task for many NLP applications. This task becomes challenging because these participants are often referred to using multiple aliases. In this paper, we propose an approach based on linguistic knowledge for identification of aliases mentioned using proper nouns, pronouns or noun phrases with common noun headword. We use Markov Logic Network (MLN) to encode the linguistic knowledge for identification of aliases. We evaluate on four diverse history narratives of varying complexity. Our approach performs better than the state-of-the-art approach as well as a combination of standard named entity recognition and coreference resolution techniques.
2017
Measuring Topic Coherence through Optimal Word Buckets
Nitin Ramrakhiyani | Sachin Pawar | Swapnil Hingmire | Girish Palshikar
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Nitin Ramrakhiyani | Sachin Pawar | Swapnil Hingmire | Girish Palshikar
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Measuring topic quality is essential for scoring the learned topics and their subsequent use in Information Retrieval and Text classification. To measure quality of Latent Dirichlet Allocation (LDA) based topics learned from text, we propose a novel approach based on grouping of topic words into buckets (TBuckets). A single large bucket signifies a single coherent theme, in turn indicating high topic coherence. TBuckets uses word embeddings of topic words and employs singular value decomposition (SVD) and Integer Linear Programming based optimization to create coherent word buckets. TBuckets outperforms the state-of-the-art techniques when evaluated using 3 publicly available datasets and on another one proposed in this paper.
Event Timeline Generation from History Textbooks
Harsimran Bedi | Sangameshwar Patil | Swapnil Hingmire | Girish Palshikar
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)
Harsimran Bedi | Sangameshwar Patil | Swapnil Hingmire | Girish Palshikar
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)
Event timeline serves as the basic structure of history, and it is used as a disposition of key phenomena in studying history as a subject in secondary school. In order to enable a student to understand a historical phenomenon as a series of connected events, we present a system for automatic event timeline generation from history textbooks. Additionally, we propose Message Sequence Chart (MSC) and time-map based visualization techniques to visualize an event timeline. We also identify key computational challenges in developing natural language processing based applications for history textbooks.
Experiments with Domain Dependent Dialogue Act Classification using Open-Domain Dialogue Corpora
Swapnil Hingmire | Apoorv Shrivastava | Girish Palshikar | Saurabh Srivastava
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)
Swapnil Hingmire | Apoorv Shrivastava | Girish Palshikar | Saurabh Srivastava
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)
2015
Noun Phrase Chunking for Marathi using Distant Supervision
Sachin Pawar | Nitin Ramrakhiyani | Girish K. Palshikar | Pushpak Bhattacharyya | Swapnil Hingmire
Proceedings of the 12th International Conference on Natural Language Processing
Sachin Pawar | Nitin Ramrakhiyani | Girish K. Palshikar | Pushpak Bhattacharyya | Swapnil Hingmire
Proceedings of the 12th International Conference on Natural Language Processing
2014
Search
Fix author
Co-authors
- Girish Palshikar 11
- Sangameshwar Patil 7
- Sachin Pawar 7
- Nitin Ramrakhiyani 7
- Pushpak Bhattacharyya 6
- Vasudeva Varma 4
- Harsimran Bedi 2
- Irene Li 2
- Dragomir Radev 2
- Ahmed Musa Awon 1
- Shrikant Budde 1
- Sutanu Chakraborti 1
- Benjamin Chen 1
- Abigairl Chigwededza 1
- Neil Ernst 1
- Alex Fabbri 1
- Thomas George 1
- Luiz Franciscatto Guerra 1
- Rina Kawamura 1
- Ontiwell Khongthaw 1
- Alok Kumar 1
- Ze Shi Li 1
- Tammy Liao 1
- Dhruvadeep Malkar 1
- Alexander Richard Fabbri 1
- G.l. Salvin 1
- Apoorv Shrivastava 1
- Avinash Kumar Singh 1
- Saurabh Srivastava 1
- Vasudeva Verma 1
- Vanessa Yan 1
- Shiyu (Vivienne) Zeng 1
- Richard Zhou 1