Ankita Jain


2023

pdf bib
Evaluation Metrics for Depth and Flow of Knowledge in Non-fiction Narrative Texts
Sachin Pawar | Girish Palshikar | Ankita Jain | Mahesh Singh | Mahesh Rangarajan | Aman Agarwal | Vishal Kumar | Karan Singh
Proceedings of the 5th Workshop on Narrative Understanding

In this paper, we describe the problem of automatically evaluating quality of knowledge expressed in a non-fiction narrative text. We focus on a specific type of documents where each document describes a certain technical problem and its solution. The goal is not only to evaluate the quality of knowledge in such a document, but also to automatically suggest possible improvements to the writer so that a better knowledge-rich document is produced. We propose new evaluation metrics to evaluate quality of knowledge contents as well as flow of different types of sentences. The suggestions for improvement are generated based on these metrics. The proposed metrics are completely unsupervised in nature and they are derived from a set of simple corpus statistics. We demonstrate the effectiveness of the proposed metrics as compared to other existing baseline metrics in our experiments.

2020

pdf bib
Weak Supervision using Linguistic Knowledge for Information Extraction
Sachin Pawar | Girish Palshikar | Ankita Jain | Jyoti Bhat | Simi Johnson
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

In this paper, we propose to use linguistic knowledge to automatically augment a small manually annotated corpus to obtain a large annotated corpus for training Information Extraction models. We propose a powerful patterns specification language for specifying linguistic rules for entity extraction. We define an Enriched Text Format (ETF) to represent rich linguistic information about a text in the form of XML-like tags. The patterns in our patterns specification language are then matched on the ETF text rather than raw text to extract various entity mentions. We demonstrate how an entity extraction system can be quickly built for a domain-specific entity type for which there are no readily available annotated datasets.