Oana Balalau


2023

pdf bib
Open Information Extraction with Entity Focused Constraints
Prajna Upadhyay | Oana Balalau | Ioana Manolescu
Findings of the Association for Computational Linguistics: EACL 2023

Open Information Extraction (OIE) is the task of extracting tuples of the form (subject, predicate, object), without any knowledge of the type and lexical form of the predicate, the subject, or the object. In this work, we focus on improving OIE quality by exploiting domain knowledge about the subject and object. More precisely, knowing that the subjects and objects in sentences are oftentimes named entities, we explore how to inject constraints in the extraction through constrained inference and constraint-aware training. Our work leverages the state-of-the-art OpenIE6 platform, which we adapt to our setting. Through a carefully constructed training dataset and constrained training, we obtain a 29.17% F1-score improvement in the CaRB metric and a 24.37% F1-score improvement in the WIRe57 metric. Our technique has important applications – one of them is investigative journalism, where automatically extracting conflict-of-interest between scientists and funding organizations helps understand the type of relations companies engage with the scientists.

pdf bib
FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
Kun Zhang | Oana Balalau | Ioana Manolescu
Findings of the Association for Computational Linguistics: EMNLP 2023

Graph-to-text (G2T) generation takes a graph as input and aims to generate a fluent and faith- ful textual representation of the information in the graph. The task has many applications, such as dialogue generation and question an- swering. In this work, we investigate to what extent the G2T generation problem is solved for previously studied datasets, and how pro- posed metrics perform when comparing generated texts. To help address their limitations, we propose a new metric that correctly identifies factual faithfulness, i.e., given a triple (subject, predicate, object), it decides if the triple is present in a generated text. We show that our metric FactSpotter achieves the highest correlation with human annotations on data correct- ness, data coverage, and relevance. In addition, FactSpotter can be used as a plug-in feature to improve the factual faithfulness of existing models. Finally, we investigate if existing G2T datasets are still challenging for state-of-the-art models. Our code is available online: https://github.com/guihuzhang/FactSpotter.

2021

pdf bib
Breaking Down the Invisible Wall of Informal Fallacies in Online Discussions
Saumya Sahai | Oana Balalau | Roxana Horincar
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

People debate on a variety of topics on online platforms such as Reddit, or Facebook. Debates can be lengthy, with users exchanging a wealth of information and opinions. However, conversations do not always go smoothly, and users sometimes engage in unsound argumentation techniques to prove a claim. These techniques are called fallacies. Fallacies are persuasive arguments that provide insufficient or incorrect evidence to support the claim. In this paper, we study the most frequent fallacies on Reddit, and we present them using the pragma-dialectical theory of argumentation. We construct a new annotated dataset of fallacies, using user comments containing fallacy mentions as noisy labels, and cleaning the data via crowdsourcing. Finally, we study the task of classifying fallacies using neural models. We find that generally the models perform better in the presence of conversational context. We have released the data and the code at github.com/sahaisaumya/informal_fallacies.

pdf bib
From the Stage to the Audience: Propaganda on Reddit
Oana Balalau | Roxana Horincar
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Political discussions revolve around ideological conflicts that often split the audience into two opposing parties. Both parties try to win the argument by bringing forward information. However, often this information is misleading, and its dissemination employs propaganda techniques. In this work, we analyze the impact of propaganda on six major political forums on Reddit that target a diverse audience in two countries, the US and the UK. We focus on three research questions: who is posting propaganda? how does propaganda differ across the political spectrum? and how is propaganda received on political forums?

2019

pdf bib
Discovering the Functions of Language in Online Forums
Youmna Ismaeil | Oana Balalau | Paramita Mirza
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

In this work, we revisit the functions of language proposed by linguist Roman Jakobson and we highlight their potential in analyzing online forum conversations. We investigate the relationship between functions and other properties of comments, such as controversiality. We propose and evaluate a semi-supervised framework for predicting the functions of Reddit comments. To accommodate further research, we release a corpus of 165K comments annotated with their functions of language.