Sonawane Sheetal


2023

pdf bib
Query-Based Summarization and Sentiment Analysis for Indian Financial Text by leveraging Dense Passage Retriever, RoBERTa, and FinBERT
Shaikh Numair | Patil Jayesh | Sonawane Sheetal
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

With the ever-expanding pool of information accessible on the Internet, it has become increasingly challenging for readers to sift through voluminous data and derive meaningful insights. This is particularly noteworthy and critical in the context of documents such as financial reports and large-scale media reports. In the realm of finance, documents are typically lengthy and comprise numerical values. This research delves into the extraction of insights through text summaries from financial data, based on the user’s interests, and the identification of clues from these insights. This research presents a straightforward, allencompassing framework for conducting querybased summarization of financial documents, as well as analyzing the sentiment of the summary. The system’s performance is evaluated using benchmarked metrics, and it is compared to State-of-The-Art (SoTA) algorithms. Extensive experimentation indicates that the proposed system surpasses existing pre-trained language models.

pdf bib
CASM - Context and Something More in Lexical Simplification
Kumbhar Atharva | Sonawane Sheetal | Kadam Dipali | Mulay Prathamesh
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Lexical Simplification is a challenging task that aims to improve the readability of text for nonnative people, people with dyslexia, and any linguistic impairments. It consists of 3 components: 1) Complex Word Identification 2) Substitute Generation 3) Substitute Ranking. Current methods use contextual information as a primary source in all three stages of the simplification pipeline. We argue that while context is an important measure, it alone is not sufficient in the process. In the complex word identification step, contextual information is inadequate, moreover, heavy feature engineering is required to use additional linguistic features. This paper presents a novel architecture for complex word identification that uses a pre-trained transformer model’s information flow through its hidden layers as a feature representation that implicitly encodes all the features required for identification. We portray how database methods and masked language modeling can be complementary to one another in substitute generation and ranking process that is built on the foundational pillars of Simplicity, Grammatical and Semantic correctness, and context preservation. We show that our proposed model generalizes well and outperforms the current state-of-the-art on wellknown datasets.

pdf bib
The Current Landscape of Multimodal Summarization
Kumbhar Atharva | Kulkarni Harsh | Mali Atmaja | Sonawane Sheetal | Mulay Prathamesh
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

In recent years, the rise of multimedia content on the internet has inundated users with a vast and diverse array of information, including images, videos, and textual data. Handling this flood of multimedia data necessitates advanced techniques capable of distilling this wealth of information into concise, meaningful summaries. Multimodal summarization, which involves generating summaries from multiple modalities such as text, images, and videos, has become a pivotal area of research in natural language processing, computer vision, and multimedia analysis. This survey paper offers an overview of the state-of-the-art techniques, methodologies, and challenges in the domain of multimodal summarization. We highlight the interdisciplinary advancements made in this field specifically on the lines of two main frontiers:1) Multimodal Abstractive Summarization, and 2) Pre-training Language Models in Multimodal Summarization. By synthesizing insights from existing research, we aim to provide a holistic understanding of multimodal summarization techniques.