Maria Francis
2024
Who Did You Blame When Your Project Failed? Designing a Corpus for Presupposition Generation in Cross-Examination Dialogues
Maria Francis
|
Julius Steuer
|
Dietrich Klakow
|
Volha Petukhova
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
This paper introduces the corpus for the novel task of presupposition generation - a natural language generation problem where a model produces a list of presuppositions carried by the given input sentence, in the context of the presented research - given the cross-examination question. Two datasets, PECaN (Presupposition, Entailment, Contradiction and Neutral) and PGen (Presuppostion Generation), are designed to fine-tune existing BERT (CITATION) and T5 (CITATION) models for classification and generation tasks. Various corpora construction methods are proposed ranging from manual annotations, prompting the GPT 3.0 model, to augmenting data from the existing corpora. The fine-tuned models achieved high accuracy on the novel Presupposition as Natural Language Inference (PNLI) task which extends the traditional Natural Language Inference (NLI) incorporating instances of presupposition into classification. T5 outperforms BERT by broad margin achieving an overall accuracy of 84.35% compared to 71.85% of BERT, and specifically when classifying presuppositions (93% vs 73% respectively). Regarding presupposition generation, we observed that despite the limited amount of data used for fine-tuning, the model displays an emerging proficiency in generation presuppositions reaching ROUGE scores of 43.47, adhering to systematic patterns that mirror valid strategies for presupposition generation, although failed to generate the complete lists.