2025
pdf
bib
abs
Effective Modeling of Generative Framework for Document-level Relational Triple Extraction
Pratik Saini
|
Tapas Nayak
Proceedings of the Workshop on Generative AI and Knowledge Graphs (GenAIK)
Document-level relation triple extraction (DocRTE) is a complex task that involves three key sub-tasks: entity mention extraction, entity clustering, and relation triple extraction. Past work has applied discriminative models to address these three sub-tasks, either by training them sequentially in a pipeline fashion or jointly training them. However, while end-to-end discriminative or generative models have proven effective for sentence-level relation triple extraction, they cannot be trivially extended to the document level, as they only handle relation extraction without addressing the remaining two sub-tasks, entity mention extraction or clustering. In this paper, we propose a three-stage generative framework leveraging a pre-trained BART model to address all three tasks required for document-level relation triple extraction. Tested on the widely used DocRED dataset, our approach outperforms previous generative methods and achieves competitive performance against discriminative models.
2023
pdf
bib
abs
90% F1 Score in Relation Triple Extraction: Is it Real?
Pratik Saini
|
Samiran Pal
|
Tapas Nayak
|
Indrajit Bhattacharya
Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP
Extracting relational triples from text is a crucial task for constructing knowledge bases. Recent advancements in joint entity and relation extraction models have demonstrated remarkable F1 scores (≥ 90%) in accurately extracting relational triples from free text. However, these models have been evaluated under restrictive experimental settings and unrealistic datasets. They overlook sentences with zero triples (zerocardinality), thereby simplifying the task. In this paper, we present a benchmark study of state-of-the-art joint entity and relation extraction models under a more realistic setting. We include sentences that lack any triples in our experiments, providing a comprehensive evaluation. Our findings reveal a significant decline (approximately 10-15% in one dataset and 6-14% in another dataset) in the models’ F1 scores within this realistic experimental setup. Furthermore, we propose a two-step modeling approach that utilizes a simple BERT-based classifier. This approach leads to overall performance improvement in these models within the realistic experimental setting.
pdf
bib
Do the Benefits of Joint Models for Relation Extraction Extend to Document-level Tasks?
Pratik Saini
|
Tapas Nayak
|
Indrajit Bhattacharya
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
2022
pdf
bib
abs
A Weak Supervision Approach for Predicting Difficulty of Technical Interview Questions
Arpita Kundu
|
Subhasish Ghosh
|
Pratik Saini
|
Tapas Nayak
|
Indrajit Bhattacharya
Proceedings of the 29th International Conference on Computational Linguistics
Predicting difficulty of questions is crucial for technical interviews. However, such questions are long-form and more open-ended than factoid and multiple choice questions explored so far for question difficulty prediction. Existing models also require large volumes of candidate response data for training. We study weak-supervision and use unsupervised algorithms for both question generation and difficulty prediction. We create a dataset of interview questions with difficulty scores for deep learning and use it to evaluate SOTA models for question difficulty prediction trained using weak supervision. Our analysis brings out the task’s difficulty as well as the promise of weak supervision for it.
pdf
bib
abs
Unsupervised Generation of Long-form Technical Questions from Textbook Metadata using Structured Templates
Indrajit Bhattacharya
|
Subhasish Ghosh
|
Arpita Kundu
|
Pratik Saini
|
Tapas Nayak
Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning
We explore the task of generating long-form technical questions from textbooks. Semi-structured metadata of a textbook — the table of contents and the index — provide rich cues for technical question generation. Existing literature for long-form question generation focuses mostly on reading comprehension assessment, and does not use semi-structured metadata for question generation. We design unsupervised template based algorithms for generating questions based on structural and contextual patterns in the index and ToC. We evaluate our approach on textbooks on diverse subjects and show that our approach generates high quality questions of diverse types. We show that, in comparison, zero-shot question generation using pre-trained LLMs on the same meta-data has much poorer quality.