Samiran Pal


2023

pdf bib
90% F1 Score in Relation Triple Extraction: Is it Real?
Pratik Saini | Samiran Pal | Tapas Nayak | Indrajit Bhattacharya
Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP

Extracting relational triples from text is a crucial task for constructing knowledge bases. Recent advancements in joint entity and relation extraction models have demonstrated remarkable F1 scores (≥ 90%) in accurately extracting relational triples from free text. However, these models have been evaluated under restrictive experimental settings and unrealistic datasets. They overlook sentences with zero triples (zerocardinality), thereby simplifying the task. In this paper, we present a benchmark study of state-of-the-art joint entity and relation extraction models under a more realistic setting. We include sentences that lack any triples in our experiments, providing a comprehensive evaluation. Our findings reveal a significant decline (approximately 10-15% in one dataset and 6-14% in another dataset) in the models’ F1 scores within this realistic experimental setup. Furthermore, we propose a two-step modeling approach that utilizes a simple BERT-based classifier. This approach leads to overall performance improvement in these models within the realistic experimental setting.

2022

pdf bib
Weakly Supervised Context-based Interview Question Generation
Samiran Pal | Kaamraan Khan | Avinash Kumar Singh | Subhasish Ghosh | Tapas Nayak | Girish Palshikar | Indrajit Bhattacharya
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

We explore the task of automated generation of technical interview questions from a given textbook. Such questions are different from those for reading comprehension studied in question generation literature. We curate a context based interview questions data set for Machine Learning and Deep Learning from two popular textbooks. We first explore the possibility of using a large generative language model (GPT-3) for this task in a zero shot setting. We then evaluate the performance of smaller generative models such as BART fine-tuned on weakly supervised data obtained using GPT-3 and hand-crafted templates. We deploy an automatic question importance assignment technique to figure out suitability of a question in a technical interview. It improves the evaluation results in many dimensions. We dissect the performance of these models for this task and also scrutinize the suitability of questions generated by them for use in technical interviews.