Ioana Manolescu

2025

Structured Discourse Representation for Factual Consistency Verification
Kun Zhang | Oana Balalau | Ioana Manolescu
Findings of the Association for Computational Linguistics: ACL 2025

Analysing the differences in how events are represented across texts, or verifying whether the language model generations hallucinate, requires the ability to systematically compare their content. To support such comparison, structured representation that captures fine-grained information plays a vital role.In particular, identifying distinct atomic facts and the discourse relations connecting them enables deeper semantic comparison. Our proposed approach combines structured discourse information extraction with a classifier, FDSpotter, for factual consistency verification. We show that adversarial discourse relations pose challenges for language models, but fine-tuning on our annotated data, DiscInfer, achieves competitive performance. Our proposed approach advances factual consistency verification by grounding in linguistic structure and decomposing it into interpretable components. We demonstrate the effectiveness of our method on the evaluation of two tasks: data-to-text generation and text summarisation. Our code and dataset will be publicly available on GitHub.

pdf bib abs

The Search for Conflicts of Interest: Open Information Extraction in Scientific Publications
Garima Gaur | Oana Balalau | Ioana Manolescu | Prajna Upadhyay
Findings of the Association for Computational Linguistics: EMNLP 2025

A conflict of interest (COI) appears when a person or a company has two or more interests that may directly conflict. This happens, for instance, when a scientist whose research is funded by a company audits the same company. For transparency and to avoid undue influence, public repositories of relations of interest are increasingly recommended or mandated in various domains, and can be used to avoid COIs. In this work, we propose an LLM-based open information extraction (OpenIE) framework for extracting financial or other types of interesting relations from scientific text. We target scientific publications in which authors declare funding sources or collaborations in the acknowledgment section, in the metadata, or in the publication, following editors’ requirements. We introduce an extraction methodology and present a knowledge base (KB) with a comprehensive taxonomy of COI centric relations. Finally, we perform a comparative study of disclosures of two journals in the field of toxicology and pharmacology.

2023

pdf bib abs

FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
Kun Zhang | Oana Balalau | Ioana Manolescu
Findings of the Association for Computational Linguistics: EMNLP 2023

Graph-to-text (G2T) generation takes a graph as input and aims to generate a fluent and faith- ful textual representation of the information in the graph. The task has many applications, such as dialogue generation and question an- swering. In this work, we investigate to what extent the G2T generation problem is solved for previously studied datasets, and how pro- posed metrics perform when comparing generated texts. To help address their limitations, we propose a new metric that correctly identifies factual faithfulness, i.e., given a triple (subject, predicate, object), it decides if the triple is present in a generated text. We show that our metric FactSpotter achieves the highest correlation with human annotations on data correct- ness, data coverage, and relevance. In addition, FactSpotter can be used as a plug-in feature to improve the factual faithfulness of existing models. Finally, we investigate if existing G2T datasets are still challenging for state-of-the-art models. Our code is available online: https://github.com/guihuzhang/FactSpotter.

pdf bib abs

Open Information Extraction with Entity Focused Constraints
Prajna Upadhyay | Oana Balalau | Ioana Manolescu
Findings of the Association for Computational Linguistics: EACL 2023

Open Information Extraction (OIE) is the task of extracting tuples of the form (subject, predicate, object), without any knowledge of the type and lexical form of the predicate, the subject, or the object. In this work, we focus on improving OIE quality by exploiting domain knowledge about the subject and object. More precisely, knowing that the subjects and objects in sentences are oftentimes named entities, we explore how to inject constraints in the extraction through constrained inference and constraint-aware training. Our work leverages the state-of-the-art OpenIE6 platform, which we adapt to our setting. Through a carefully constructed training dataset and constrained training, we obtain a 29.17% F1-score improvement in the CaRB metric and a 24.37% F1-score improvement in the WIRe57 metric. Our technique has important applications – one of them is investigative journalism, where automatically extracting conflict-of-interest between scientists and funding organizations helps understand the type of relations companies engage with the scientists.

Co-authors

Venues

Findings4

Fix author