Prasenjit Dey


2024

pdf bib
On The Persona-based Summarization of Domain-Specific Documents
Ankan Mullick | Sombit Bose | Rounak Saha | Ayan Bhowmick | Pawan Goyal | Niloy Ganguly | Prasenjit Dey | Ravi Kokku
Findings of the Association for Computational Linguistics: ACL 2024

In an ever-expanding world of domain-specific knowledge, the increasing complexity of consuming, and storing information necessitates the generation of summaries from large information repositories. However, every persona of a domain has different requirements of information and hence their summarization. For example, in the healthcare domain, a persona-based (such as Doctor, Nurse, Patient etc.) approach is imperative to deliver targeted medical information efficiently. Persona-based summarization of domain-specific information by humans is a high cognitive load task and is generally not preferred. The summaries generated by two different humans have high variability and do not scale in cost and subject matter expertise as domains and personas grow. Further, AI-generated summaries using generic Large Language Models (LLMs) may not necessarily offer satisfactory accuracy for different domains unless they have been specifically trained on domain-specific data and can also be very expensive to use in day-to-day operations. Our contribution in this paper is two-fold: 1) We present an approach to efficiently fine-tune a domain-specific small foundation LLM using a healthcare corpus and also show that we can effectively evaluate the summarization quality using AI-based critiquing. 2) We further show that AI-based critiquing has good concordance with Human-based critiquing of the summaries. Hence, such AI-based pipelines to generate domain-specific persona-based summaries can be easily scaled to other domains such as legal, enterprise documents, education etc. in a very efficient and cost-effective manner.

2019

pdf bib
Content Customization for Micro Learning using Human Augmented AI Techniques
Ayush Shah | Tamer Abuelsaad | Jae-Wook Ahn | Prasenjit Dey | Ravi Kokku | Ruhi Sharma Mittal | Aditya Vempaty | Mourvi Sharma
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Visual content has been proven to be effective for micro-learning compared to other media. In this paper, we discuss leveraging this observation in our efforts to build audio-visual content for young learners’ vocabulary learning. We attempt to tackle two major issues in the process of traditional visual curation tasks. Generic learning videos do not necessarily satisfy the unique context of a learner and/or an educator, and hence may not result in maximal learning outcomes. Also, manual video curation by educators is a highly labor-intensive process. To this end, we present a customizable micro-learning audio-visual content curation tool that is designed to reduce the human (educator) effort in creating just-in-time learning videos from a textual description (learning script). This provides educators with control of the content while preparing the learning scripts, and in turn can also be customized to capture the desired learning objectives and outcomes. As a use case, we automatically generate learning videos with British National Corpus’ (BNC) frequently spoken vocabulary words and evaluate them with experts. They positively recommended the generated learning videos with an average rating of 4.25 on a Likert scale of 5 points. The inter-annotator agreement between the experts for the video quality was substantial (Fleiss Kappa=0.62) with an overall agreement of 81%.

2016

pdf bib
A Framework for Mining Enterprise Risk and Risk Factors from News Documents
Tirthankar Dasgupta | Lipika Dey | Prasenjit Dey | Rupsa Saha
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Any real world events or trends that can affect the company’s growth trajectory can be considered as risk. There has been a growing need to automatically identify, extract and analyze risk related statements from news events. In this demonstration, we will present a risk analytics framework that processes enterprise project management reports in the form of textual data and news documents and classify them into valid and invalid risk categories. The framework also extracts information from the text pertaining to the different categories of risks like their possible cause and impacts. Accordingly, we have used machine learning based techniques and studied different linguistic features like n-gram, POS, dependency, future timing, uncertainty factors in texts and their various combinations. A manual annotation study from management experts using risk descriptions collected for a specific organization was conducted to evaluate the framework. The evaluation showed promising results for automated risk analysis and identification.