Emilio Ferrara

2023

pdf bib abs
Identifying Informational Sources in News Articles
Alexander Spangher | Nanyun Peng | Emilio Ferrara | Jonathan May
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

News articles are driven by the informational sources journalists use in reporting. Modeling when, how and why sources get used together in stories can help us better understand the information we consume and even help journalists with the task of producing it. In this work, we take steps toward this goal by constructing the largest and widest-ranging annotated dataset, to date, of informational sources used in news writing. We first show that our dataset can be used to train high-performing models for information detection and source attribution. Then, we introduce a novel task, source prediction, to study the compositionality of sources in news articles – i.e. how they are chosen to complement each other. We show good modeling performance on this task, indicating that there is a pattern to the way different sources are used together in news storytelling. This insight opens the door for a focus on sources in narrative science (i.e. planning-based language generation) and computational journalism (i.e. a source-recommendation system to aid journalists writing stories). All data and model code can be found at https://github.com/alex2awesome/source-exploration.

We propose CHRT (Control HiddenRepresentation Transformation) – a con-trolled language generation framework thatsteers large language models to generatetext pertaining to certain attributes (such astoxicity). CHRT gains attribute control bymodifying the hidden representation of thebase model through learned transformations. We employ a contrastive-learning frameworkto learn these transformations that can becombined to gain multi-attribute control. Theeffectiveness of CHRT is experimentallyshown by comparing it with seven baselinesover three attributes. CHRT outperforms all thebaselines in the task of detoxification, positivesentiment steering, and text simplificationwhile minimizing the loss in linguistic qualities. Further, our approach has the lowest inferencelatency of only 0.01 seconds more than thebase model, making it the most suitable forhigh-performance production environments. We open-source our code and release two noveldatasets to further propel controlled languagegeneration research

2021

pdf bib abs
Using Word Embedding to Reveal Monetary Policy Explanation Changes
Akira Matsui | Xiang Ren | Emilio Ferrara
Proceedings of the Third Workshop on Economics and Natural Language Processing

Documents have been an essential tool of communication for governments to announce their policy operations. Most policy announcements have taken the form of text to inform their new policies or changes to the public. To understand such policymakers’ communication, many researchers exploit published policy documents. However, the methods well-used in other research domains such as sentiment analysis or topic modeling are not suitable for studying policy communications. Their training corpora and methods are not for policy documents where technical terminologies are used, and sentiment expressions are refrained. We leverage word embedding techniques to extract semantic changes in the monetary policy documents. Our empirical study shows that the policymaker uses different semantics according to the type of documents when they change their policy.