Emilio Ferrara


2024

pdf bib
Can Language Model Moderators Improve the Health of Online Discourse?
Hyundong Cho | Shuai Liu | Taiwei Shi | Darpan Jain | Basem Rizk | Yuyang Huang | Zixun Lu | Nuan Wen | Jonathan Gratch | Emilio Ferrara | Jonathan May
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Conversational moderation of online communities is crucial to maintaining civility for a constructive environment, but it is challenging to scale and harmful to moderators. The inclusion of sophisticated natural language generation modules as a force multiplier to aid human moderators is a tantalizing prospect, but adequate evaluation approaches have so far been elusive. In this paper, we establish a systematic definition of conversational moderation effectiveness grounded on moderation literature and establish design criteria for conducting realistic yet safe evaluation. We then propose a comprehensive evaluation framework to assess models’ moderation capabilities independently of human intervention. With our framework, we conduct the first known study of language models as conversational moderators, finding that appropriately prompted models that incorporate insights from social science can provide specific and fair feedback on toxic behavior but struggle to influence users to increase their levels of respect and cooperation.

2023

pdf bib
Controlled Text Generation with Hidden Representation Transformations
Vaibhav Kumar | Hana Koorehdavoudi | Masud Moshtaghi | Amita Misra | Ankit Chadha | Emilio Ferrara
Findings of the Association for Computational Linguistics: ACL 2023

We propose CHRT (Control HiddenRepresentation Transformation) – a con-trolled language generation framework thatsteers large language models to generatetext pertaining to certain attributes (such astoxicity). CHRT gains attribute control bymodifying the hidden representation of thebase model through learned transformations. We employ a contrastive-learning frameworkto learn these transformations that can becombined to gain multi-attribute control. Theeffectiveness of CHRT is experimentallyshown by comparing it with seven baselinesover three attributes. CHRT outperforms all thebaselines in the task of detoxification, positivesentiment steering, and text simplificationwhile minimizing the loss in linguistic qualities. Further, our approach has the lowest inferencelatency of only 0.01 seconds more than thebase model, making it the most suitable forhigh-performance production environments. We open-source our code and release two noveldatasets to further propel controlled languagegeneration research

pdf bib
Identifying Informational Sources in News Articles
Alexander Spangher | Nanyun Peng | Emilio Ferrara | Jonathan May
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

News articles are driven by the informational sources journalists use in reporting. Modeling when, how and why sources get used together in stories can help us better understand the information we consume and even help journalists with the task of producing it. In this work, we take steps toward this goal by constructing the largest and widest-ranging annotated dataset, to date, of informational sources used in news writing. We first show that our dataset can be used to train high-performing models for information detection and source attribution. Then, we introduce a novel task, source prediction, to study the compositionality of sources in news articles – i.e. how they are chosen to complement each other. We show good modeling performance on this task, indicating that there is a pattern to the way different sources are used together in news storytelling. This insight opens the door for a focus on sources in narrative science (i.e. planning-based language generation) and computational journalism (i.e. a source-recommendation system to aid journalists writing stories). All data and model code can be found at https://github.com/alex2awesome/source-exploration.

2021

pdf bib
Using Word Embedding to Reveal Monetary Policy Explanation Changes
Akira Matsui | Xiang Ren | Emilio Ferrara
Proceedings of the Third Workshop on Economics and Natural Language Processing

Documents have been an essential tool of communication for governments to announce their policy operations. Most policy announcements have taken the form of text to inform their new policies or changes to the public. To understand such policymakers’ communication, many researchers exploit published policy documents. However, the methods well-used in other research domains such as sentiment analysis or topic modeling are not suitable for studying policy communications. Their training corpora and methods are not for policy documents where technical terminologies are used, and sentiment expressions are refrained. We leverage word embedding techniques to extract semantic changes in the monetary policy documents. Our empirical study shows that the policymaker uses different semantics according to the type of documents when they change their policy.

2020

pdf bib
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020
Karin Verspoor | Kevin Bretonnel Cohen | Mark Dredze | Emilio Ferrara | Jonathan May | Robert Munro | Cecile Paris | Byron Wallace
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

pdf bib
Enabling Low-Resource Transfer Learning across COVID-19 Corpora by Combining Event-Extraction and Co-Training
Alexander Spangher | Nanyun Peng | Jonathan May | Emilio Ferrara
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020