pdf
bib
Proceedings of the Big Picture Workshop
Yanai Elazar
|
Allyson Ettinger
|
Nora Kassner
|
Sebastian Ruder
|
Noah A. Smith
pdf
bib
abs
Where are We in Event-centric Emotion Analysis? Bridging Emotion Role Labeling and Appraisal-based Approaches
Roman Klinger
The term emotion analysis in text subsumes various natural language processing tasks which have in common the goal to enable computers to understand emotions. Most popular is emotion classification in which one or multiple emotions are assigned to a predefined textual unit. While such setting is appropriate for identifying the reader’s or author’s emotion, emotion role labeling adds the perspective of mentioned entities and extracts text spans that correspond to the emotion cause. The underlying emotion theories agree on one important point; that an emotion is caused by some internal or external event and comprises several subcomponents, including the subjective feeling and a cognitive evaluation. We therefore argue that emotions and events are related in two ways. (1) Emotions are events; and this perspective is the fundament in natural language processing for emotion role labeling. (2) Emotions are caused by events; a perspective that is made explicit with research how to incorporate psychological appraisal theories in NLP models to interpret events. These two research directions, role labeling and (event-focused) emotion classification, have by and large been tackled separately. In this paper, we contextualize both perspectives and discuss open research questions.
pdf
bib
abs
Working Towards Digital Documentation of Uralic Languages With Open-Source Tools and Modern NLP Methods
Mika Hämäläinen
|
Jack Rueter
|
Khalid Alnajjar
|
Niko Partanen
We present our work towards building an infrastructure for documenting endangered languages with the focus on Uralic languages in particular. Our infrastructure consists of tools to write dictionaries so that entries are structured in XML format. These dictionaries are the foundation for rule-based NLP tools such as FSTs. We also work actively towards enhancing these dictionaries and tools by using the latest state-of-the-art neural models by generating training data through rules and lexica
pdf
bib
abs
Computational Narrative Understanding: A Big Picture Analysis
Andrew Piper
This paper provides an overview of outstanding major research goals for the field of computational narrative understanding. Storytelling is an essential human practice, one that provides a sense of personal meaning, shared sense of community, and individual enjoyment. A number of research domains have increasingly focused on storytelling as a key mechanism for explaining human behavior. Now is an opportune moment to provide a vision of the contributions that computational narrative understanding can make towards this collective endeavor and the challenges facing the field. In addition to providing an overview of the elements of narrative, this paper outlines three major lines of inquiry: understanding the multi-modality of narrative; the temporal patterning of narrative (narrative “shape”); and socio-cultural narrative schemas, i.e. collective narratives. The paper concludes with a call for more inter-disciplinary working groups and deeper investment in building cross-cultural and multi-modal narrative datasets.
pdf
bib
abs
The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific Progress in NLP
Julian Michael
I propose a paradigm for scientific progress in NLP centered around developing scalable, data-driven theories of linguistic structure. The idea is to collect data in tightly scoped, carefully defined ways which allow for exhaustive annotation of behavioral phenomena of interest, and then use machine learning to construct explanatory theories of these phenomena which can form building blocks for intelligible AI systems. After laying some conceptual groundwork, I describe several investigations into data-driven theories of shallow semantic structure using Question-Answer driven Semantic Role Labeling (QA-SRL), a schema for annotating verbal predicate-argument relations using highly constrained question-answer pairs. While this only scratches the surface of the complex language behaviors of interest in AI, I outline principles for data collection and theoretical modeling which can inform future scientific progress. This note summarizes and draws heavily on my PhD thesis.
pdf
bib
abs
Thesis Distillation: Investigating The Impact of Bias in NLP Models on Hate Speech Detection
Fatma Elsafoury
This paper is a summary of the work done in my PhD thesis. Where I investigate the impact of bias in NLP models on the task of hate speech detection from three perspectives: explainability, offensive stereotyping bias, and fairness. Then, I discuss the main takeaways from my thesis and how they can benefit the broader NLP community. Finally, I discuss important future research directions. The findings of my thesis suggest that the bias in NLP models impacts the task of hate speech detection from all three perspectives. And that unless we start incorporating social sciences in studying bias in NLP models, we will not effectively overcome the current limitations of measuring and mitigating bias in NLP models.
pdf
bib
abs
Large Language Models as SocioTechnical Systems
Kaustubh Dhole
The expectation of Large Language Models (LLMs) to solve various societal problems has ignored the larger socio-technical frame of reference under which they operate. From a socio-technical perspective, LLMs are necessary to look at separately from other ML models as they have radically different implications in society never witnessed before. In this article, we ground Selbst et al.(2019)’s five abstraction traps – The Framing Trap, The Portability Trap, The Formalism Trap, The Ripple Effect Trap and the Solutionism Trap in the context of LLMs discussing the problems associated with the abstraction and fairness of LLMs. Through learnings from previous studies and examples, we discuss each trap that LLMs fall into, and propose ways to address the points of LLM failure by gauging them from a socio-technical lens. We believe the discussions would provide a broader perspective of looking at LLMs through a sociotechnical lens and our recommendations could serve as baselines to effectively demarcate responsibilities among the various technical and social stakeholders and inspire future LLM research.
pdf
bib
abs
Towards Low-resource Language Generation with Limited Supervision
Kaushal Maurya
|
Maunendra Desarkar
We present a research narrative aimed at enabling language technology for multiple natural language generation (NLG) tasks in low-resource languages (LRLs). With approximately 7,000 languages spoken globally, many lack the resources required for model training. NLG applications for LRLs present two additional key challenges: (i) The training is more pronounced, and (ii) Zero-shot modeling is a viable research direction for scalability; however, generating zero-shot well-formed text in target LRLs is challenging. Addressing these concerns, this narrative introduces three promising research explorations that serve as a step toward enabling language technology for many LRLs. These approaches make effective use of transfer learning and limited supervision techniques for modeling. Evaluations were conducted mostly in the zero-shot setting, enabling scalability. This research narrative is an ongoing doctoral thesis.
pdf
bib
abs
Transformers as Graph-to-Graph Models
James Henderson
|
Alireza Mohammadshahi
|
Andrei Coman
|
Lesly Miculicich
We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case. Attention weights are functionally equivalent to graph edges. Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions, thereby integrating explicit graphs into the latent graphs learned by pretrained Transformers. Adding iterative graph refinement provides a joint embedding of input, output, and latent graphs, allowing non-autoregressive graph prediction to optimise the complete graph without any bespoke pipeline or decoding strategy. Empirical results show that this architecture achieves state-of-the-art accuracies for modelling a variety of linguistic structures, integrating very effectively with the latent linguistic representations learned by pretraining.
pdf
bib
abs
It’s MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk
Amanda Bertsch
|
Alex Xie
|
Graham Neubig
|
Matthew Gormley
Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine learning system based not on the output with the highest probability, but the output with the lowest risk (expected error) among multiple candidates. It is a simple but powerful method: for an additional cost at inference time, MBR provides reliable several-point improvements across metrics for a wide variety of tasks without any additional data or training. Despite this, MBR is not frequently applied in NLP works, and knowledge of the method itself is limited. We first provide an introduction to the method and the recent literature. We show that several recent methods that do not reference MBR can be written as special cases of MBR; this reformulation provides additional theoretical justification for the performance of these methods, explaining some results that were previously only empirical. We provide theoretical and empirical results about the effectiveness of various MBR variants and make concrete recommendations for the application of MBR in NLP models, including future directions in this area.
pdf
bib
abs
Analyzing Pre-trained and Fine-tuned Language Models
Marius Mosbach
Since the introduction of transformer-based language models in 2018, the current generation of natural language processing (NLP) models continues to demonstrate impressive capabilities on a variety of academic benchmarks and real-world applications. This progress is based on a simple but general pipeline which consists of pre-training neural language models on large quantities of text, followed by an adaptation step that fine-tunes the pre-trained model to perform a specific NLP task of interest. However, despite the impressive progress on academic benchmarks and the widespread deployment of pre-trained and fine-tuned language models in industry we still lack a fundamental understanding of how and why pre-trained and fine-tuned language models work as well as the individual steps of the pipeline that produce them. We makes several contributions towards improving our understanding of pre-trained and fine-tuned language models, ranging from analyzing the linguistic knowledge of pre-trained language models and how it is affected by fine-tuning, to a rigorous analysis of the fine-tuning process itself and how the choice of adaptation technique affects the generalization of models and thereby provide new insights about previously unexplained phenomena and the capabilities of pre-trained and fine-tuned language models.