Ekaterina Shutova - ACL Anthology

Ekaterina Shutova

2026

Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning
Sara Rajaee | Rochelle Choenni | Ekaterina Shutova | Christof Monz
Findings of the Association for Computational Linguistics: EACL 2026

While the reasoning abilities of large language models (LLMs) continue to advance, it remains underexplored how such abilities vary across languages in multilingual LLMs and whether different languages generate distinct reasoning paths. In this work, we show that reasoning traces generated in different languages often provide complementary signals for mathematical reasoning. We propose cross-lingual outcome reward modeling, a framework that ranks candidate reasoning traces across languages rather than within a single language.Our experiments on the MGSM benchmark show that cross-lingual reward modeling improves accuracy by up to 10 points compared to using reward modeling within a single language, benefiting both high- and low-resource languages.Notably, cross-lingual sampling improves English performance under low inference budgets, despite English being the strongest individual language.Our findings reveal new opportunities to improve multilingual reasoning by leveraging the complementary strengths of diverse languages.

2025

Do large language models solve verbal analogies like children do?
Tamar Johnson | Mathilde ter Veen | Rochelle Choenni | Han van der Maas | Ekaterina Shutova | Claire E Stevenson
Proceedings of the 29th Conference on Computational Natural Language Learning

Analogy-making lies at the heart of human cognition. Adults solve analogies such as horse belongs to stable like chicken belongs to …? by mapping relations (kept in) and answering chicken coop. In contrast, young children often use association, e.g., answering egg. This paper investigates whether large language models (LLMs) solve verbal analogies in A:B::C:? form using associations, similar to what children do. We use verbal analogies extracted from an online learning environment, where 14,006 7-12 year-olds from the Netherlands solved 872 analogies in Dutch. The eight tested LLMs performed at or above the level of children, with some models approaching adult performance estimates. However, when we control for solving by association this picture changes. We conclude that the LLMs we tested rely heavily on association like young children do. However, LLMs make different errors than children, and association doesn’t fully explain their superior performance on this children’s verbal analogy task. Future work will investigate whether LLMs associations and errors are more similar to adult relational reasoning.

An Empirical Analysis of Machine Translation for Expanding Multilingual Benchmarks
Sara Rajaee | Rochelle Choenni | Ekaterina Shutova | Christof Monz
Proceedings of the Tenth Conference on Machine Translation

The rapid advancement of large language models (LLMs) has introduced new challenges in their evaluation, particularly for multilingual settings. The limited evaluation data are more pronounced in low-resource languages due to the scarcity of professional annotators, hindering fair progress across languages. In this work, we systematically investigate the viability of using machine translation (MT) as a proxy for evaluation in scenarios where human-annotated test sets are unavailable. Leveraging a state-of-the-art translation model, we translate datasets from four tasks into 198 languages and employ these translations to assess the quality and robustness of MT-based multilingual evaluation under different setups. We analyze task-specific error patterns, identifying when MT-based evaluation is reliable and when it produces misleading results. Our translated benchmark reveals that current language selections in multilingual datasets tend to overestimate LLM performance on low-resource languages. We conclude that although machine translation is not yet a fully reliable method for evaluating multilingual models, overlooking its potential means missing a valuable opportunity to track progress in non-English languages.

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Wanxiang Che | Joyce Nabende | Ekaterina Shutova | Mohammad Taher Pilehvar
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning
Joy Crosbie | Ekaterina Shutova
Findings of the Association for Computational Linguistics: NAACL 2025

Large language models (LLMs) have shown a remarkable ability to learn and perform complex tasks through in-context learning (ICL). However, a comprehensive understanding of its internal mechanisms is still lacking. This paper explores the role of induction heads in a few-shot ICL setting. We analyse two state-of-the-art models, Llama-3-8B and InternLM2-20B on abstract pattern recognition and NLP tasks. Our results show that even a minimal ablation of induction heads leads to ICL performance decreases of up to ~32% for abstract pattern recognition tasks, bringing the performance close to random. For NLP tasks, this ablation substantially decreases the model’s ability to benefit from examples, bringing few-shot ICL performance close to that of zero-shot prompts. We further use attention knockout to disable specific induction patterns, and present fine-grained evidence for the role that the induction mechanism plays in ICL.

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Wanxiang Che | Joyce Nabende | Ekaterina Shutova | Mohammad Taher Pilehvar
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models
Srishti Yadav | Zhi Zhang | Daniel Hershcovich | Ekaterina Shutova
Findings of the Association for Computational Linguistics: NAACL 2025

Investigating value alignment in Large Language Models (LLMs) based on cultural context has become a critical area of research. However, similar biases have not been extensively explored in large vision-language models (VLMs). As the scale of multimodal models continues to grow, it becomes increasingly important to assess whether images can serve as reliable proxies for culture and how these values are embedded through the integration of both visual and textual data. In this paper, we conduct a thorough evaluation of multimodal model at different scales, focusing on their alignment with cultural values. Our findings reveal that, much like LLMs, VLMs exhibit sensitivity to cultural values, but their performance in aligning with these values is highly context-dependent. While VLMs show potential in improving value understanding through the use of images, this alignment varies significantly across contexts highlighting the complexities and underexplored challenges in the alignment of multimodal models.

Findings of the Association for Computational Linguistics: ACL 2025
Wanxiang Che | Joyce Nabende | Ekaterina Shutova | Mohammad Taher Pilehvar
Findings of the Association for Computational Linguistics: ACL 2025

NeuroAda: Activating Each Neuron’s Potential for Parameter-Efficient Fine-Tuning
Zhi Zhang | Yixian Shen | Congfeng Cao | Ekaterina Shutova
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Existing parameter-efficient fine-tuning (PEFT) methods primarily fall into two categories: addition-based and selective in-situ adaptation. The former, such as LoRA, introduce additional modules to adapt the model to downstream tasks, offering strong memory efficiency. However, their representational capacity is often limited, making them less suitable for fine-grained adaptation. In contrast, the latter directly fine-tunes a carefully chosen subset of the original model parameters, allowing for more precise and effective adaptation, but at the cost of significantly increased memory consumption.To reconcile this trade-off, we propose NeuroAda, a novel PEFT method that enables fine-grained model finetuning while maintaining high memory efficiency. Our approach first identifies important parameters (i.e., connections within the network) as in selective adaptation, and then introduces bypass connections for these selected parameters. During finetuning, only the bypass connections are updated, leaving the original model parameters frozen.Empirical results on 23+ tasks spanning both natural language generation and understanding demonstrate that NeuroAda achieves state-of-the-art performance with as little as ≤ 0.02% trainable parameters, while reducing CUDA memory usage by up to 60%.We release our code here: https://github.com/FightingFighting/NeuroAda.git.

2024

A (More) Realistic Evaluation Setup for Generalisation of Community Models on Malicious Content Detection
Ivo Verhoeven | Pushkar Mishra | Rahel Beloch | Helen Yannakoudakis | Ekaterina Shutova
Findings of the Association for Computational Linguistics: NAACL 2024

Community models for malicious content detection, which take into account the context from a social graph alongside the content itself, have shown remarkable performance on benchmark datasets. Yet, misinformation and hate speech continue to propagate on social media networks. This mismatch can be partially attributed to the limitations of current evaluation setups that neglect the rapid evolution of online content and the underlying social graph. In this paper, we propose a novel evaluation setup for model generalisation based on our few-shot subgraph sampling approach. This setup tests for generalisation through few labelled examples in local explorations of a larger graph, emulating more realistic application settings. We show this to be a challenging inductive setup, wherein strong performance on the training graph is not indicative of performance on unseen tasks, domains, or graph structures. Lastly, we show that graph meta-learners trained with our proposed few-shot subgraph sampling outperform standard community models in the inductive setup.

Examining Modularity in Multilingual LMs via Language-Specialized Subnetworks
Rochelle Choenni | Ekaterina Shutova | Dan Garrette
Findings of the Association for Computational Linguistics: NAACL 2024

Recent work has proposed explicitly inducing language-wise modularity in multilingual LMs via sparse fine-tuning (SFT) on per-language subnetworks as a means of better guiding cross-lingual sharing. In this paper, we investigate (1) the degree to which language-wise modularity *naturally* arises within models with no special modularity interventions, and (2) how cross-lingual sharing and interference differ between such models and those with explicit SFT-guided subnetwork modularity. In order to do so, we use XLM-R as our multilingual LM. Moreover, to quantify language specialization and cross-lingual interaction, we use a Training Data Attribution method that estimates the degree to which a model’s predictions are influenced by in-language or cross-language training examples. Our results show that language-specialized subnetworks do naturally arise, and that SFT, rather than always increasing modularity, can decrease language specialization of subnetworks in favor of more cross-lingual sharing.

Learning New Tasks from a Few Examples with Soft-Label Prototypes
Avyav Singh | Ekaterina Shutova | Helen Yannakoudakis
Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024)

Existing approaches to few-shot learning in NLP rely on large language models (LLMs) and/or fine-tuning of these to generalise on out-of-distribution data. In this work, we propose a novel few-shot learning approach based on soft-label prototypes (SLPs) designed to collectively capture the distribution of different classes across the input domain space. We focus on learning previously unseen NLP tasks from very few examples (4, 8, 16) per class and experimentally demonstrate that our approach achieves superior performance on the majority of tested tasks in this data-lean setting while being highly parameter efficient. We also show that our few-shot adaptation method can be integrated into more generalised learning settings, primarily meta-learning, to yield superior performance against strong baselines.

Are LLMs classical or nonmonotonic reasoners? Lessons from generics
Alina Leidinger | Robert Van Rooij | Ekaterina Shutova
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Recent scholarship on reasoning in LLMs has supplied evidence of impressive performance and flexible adaptation to machine generated or human critique. Nonmonotonic reasoning, crucial to human cognition for navigating the real world, remains a challenging, yet understudied task. In this work, we study nonmonotonic reasoning capabilities of seven state-of-the-art LLMs in one abstract and one commonsense reasoning task featuring generics, such as ‘Birds fly’, and exceptions, ‘Penguins don’t fly’ (see Fig. 1). While LLMs exhibit reasoning patterns in accordance with human nonmonotonic reasoning abilities, they fail to maintain stable beliefs on truth conditions of generics at the addition of supporting examples (‘Owls fly’) or unrelated information (‘Lions have manes’).Our findings highlight pitfalls in attributing human reasoning behaviours to LLMs as long as consistent reasoning remains elusive.

Metaphor Understanding Challenge Dataset for LLMs
Xiaoyu Tong | Rochelle Choenni | Martha Lewis | Ekaterina Shutova
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Metaphors in natural language are a reflection of fundamental cognitive processes such as analogical reasoning and categorisation, and are deeply rooted in everyday communication. Metaphor understanding is therefore an essential task for large language models (LLMs). We release the Metaphor Understanding Challenge Dataset (MUNCH), designed to evaluate the metaphor understanding capabilities of LLMs. The dataset provides over 10k paraphrases for sentences containing metaphor use, as well as 1.5k instances containing inapt paraphrases. The inapt paraphrases were carefully selected to serve as control to determine whether the model indeed performs full metaphor interpretation or rather resorts to lexical similarity. All apt and inapt paraphrases were manually annotated. The metaphorical sentences cover natural metaphor uses across 4 genres (academic, news, fiction, and conversation), and they exhibit different levels of novelty. Experiments with LLaMA and GPT-3.5 demonstrate that MUNCH presents a challenging task for LLMs. The dataset is freely accessible at https://github.com/xiaoyuisrain/metaphor-understanding-challenge.

The Echoes of Multilinguality: Tracing Cultural Value Shifts during Language Model Fine-tuning
Rochelle Choenni | Anne Lauscher | Ekaterina Shutova
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Texts written in different languages reflect different culturally-dependent beliefs of their writers. Thus, we expect multilingual LMs (MLMs), that are jointly trained on a concatenation of text in multiple languages, to encode different cultural values for each language. Yet, as the ‘multilinguality’ of these LMs is driven by cross-lingual sharing, we also have reason to belief that cultural values bleed over from one language into another. This limits the use of MLMs in practice, as apart from being proficient in generating text in multiple languages, creating language technology that can serve a community also requires the output of LMs to be sensitive to their biases (Naous et al. 2023). Yet, little is known about how cultural values emerge and evolve in MLMs (Hershcovich et al. 2022). We are the first to study how languages can exert influence on the cultural values encoded for different test languages, by studying how such values are revised during fine-tuning. Focusing on the fine-tuning stage allows us to study the interplay between value shifts when exposed to new linguistic experience from different data sources and languages. Lastly, we use a training data attribution method to find patterns in the fine-tuning examples, and the languages that they come from, that tend to instigate value shifts.

2023

What’s the Meaning of Superhuman Performance in Today’s NLU?
Simone Tedeschi | Johan Bos | Thierry Declerck | Jan Hajič | Daniel Hershcovich | Eduard Hovy | Alexander Koller | Simon Krek | Steven Schockaert | Rico Sennrich | Ekaterina Shutova | Roberto Navigli
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In the last five years, there has been a significant focus in Natural Language Processing (NLP) on developing larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension. These PLMs have achieved impressive results on these benchmarks, even surpassing human performance in some cases. This has led to claims of superhuman capabilities and the provocative idea that certain tasks have been solved. In this position paper, we take a critical look at these claims and ask whether PLMs truly have superhuman abilities and what the current benchmarks are really evaluating. We show that these benchmarks have serious limitations affecting the comparison between humans and PLMs and provide recommendations for fairer and more transparent benchmarks.

How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning
Rochelle Choenni | Dan Garrette | Ekaterina Shutova
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Multilingual language models (MLMs) are jointly trained on data from many different languages such that representation of individual languages can benefit from other languages’ data. Impressive performance in zero-shot cross-lingual transfer shows that these models are able to exploit this property. Yet, it remains unclear to what extent, and under which conditions, languages rely on each other’s data. To answer this question, we use TracIn (Pruthi et al., 2020), a training data attribution (TDA) method, to retrieve training samples from multilingual data that are most influential for test predictions in a given language. This allows us to analyse cross-lingual sharing mechanisms of MLMs from a new perspective. While previous work studied cross-lingual sharing at the model parameter level, we present the first approach to study it at the data level. We find that MLMs rely on data from multiple languages during fine-tuning and this reliance increases as fine-tuning progresses. We further find that training samples from other languages can both reinforce and complement the knowledge acquired from data of the test language itself.

K-hop neighbourhood regularization for few-shot learning on graphs: A case study of text classification
Niels van der Heijden | Ekaterina Shutova | Helen Yannakoudakis
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

We present FewShotTextGCN, a novel method designed to effectively utilize the properties of word-document graphs for improved learning in low-resource settings. We introduce K-hop Neighbourhood Regularization, a regularizer for heterogeneous graphs, and show that it stabilizes and improves learning when only a few training samples are available. We furthermore propose a simplification in the graph-construction method, which results in a graph that is ∼7 times less dense and yields better performance in little-resource settings while performing on par with the state of the art in high-resource settings. Finally, we introduce a new variant of Adaptive Pseudo-Labeling tailored for word-document graphs. When using as little as 20 samples for training, we outperform a strong TextGCN baseline with 17% in absolute accuracy on average over eight languages. We demonstrate that our method can be applied to document classification without any language model pretraining on a wide range of typologically diverse languages while performing on par with large pretrained language models.

CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
Zhi Zhang | Helen Yannakoudakis | Xiantong Zhen | Ekaterina Shutova
Findings of the Association for Computational Linguistics: EACL 2023

The task of multimodal referring expression comprehension (REC), aiming at localizing an image region described by a natural language expression, has recently received increasing attention within the research comminity. In this paper, we specifically focus on referring expression comprehension with commonsense knowledge (KB-Ref), a task which typically requires reasoning beyond spatial, visual or semantic information. We propose a novel framework for Commonsense Knowledge Enhanced Transformers (CK-Transformer) which effectively integrates commonsense knowledge into the representations of objects in an image, facilitating identification of the target objects referred to by the expressions. We conduct extensive experiments on several benchmarks for the task of KB-Ref. Our results show that the proposed CK-Transformer achieves a new state of the art, with an absolute improvement of 3.14% accuracy over the existing state of the art.

Paper Bullets: Modeling Propaganda with the Help of Metaphor
Daniel Baleato Rodríguez | Verna Dankers | Preslav Nakov | Ekaterina Shutova
Findings of the Association for Computational Linguistics: EACL 2023

Propaganda aims to persuade an audience by appealing to emotions and using faulty reasoning, with the purpose of promoting a particular point of view. Similarly, metaphor modifies the semantic frame, thus eliciting a response that can be used to tune up or down the emotional volume of the message. Given the close relationship between them, we hypothesize that, when modeling them computationally, it can be beneficial to do so jointly. In particular, we perform multi-task learning with propaganda identification as the main task and metaphor detection as an auxiliary task. To the best of our knowledge, this is the first work that models metaphor and propaganda together. We experiment with two datasets for identifying propaganda techniques in news articles and in memes shared on social media. We find that leveraging metaphor improves model performance, particularly for the two most common propaganda techniques: loaded language and name-calling.

Cross-Lingual Transfer with Language-Specific Subnetworks for Low-Resource Dependency Parsing
Rochelle Choenni | Dan Garrette | Ekaterina Shutova
Computational Linguistics, Volume 49, Issue 3 - September 2023

Large multilingual language models typically share their parameters across all languages, which enables cross-lingual task transfer, but learning can also be hindered when training updates from different languages are in conflict. In this article, we propose novel methods for using language-specific subnetworks, which control cross-lingual parameter sharing, to reduce conflicts and increase positive transfer during fine-tuning. We introduce dynamic subnetworks, which are jointly updated with the model, and we combine our methods with meta-learning, an established, but complementary, technique for improving cross-lingual transfer. Finally, we provide extensive analyses of how each of our methods affects the models.

Probing LLMs for Joint Encoding of Linguistic Categories
Giulio Starace | Konstantinos Papakostas | Rochelle Choenni | Apostolos Panagiotopoulos | Matteo Rosati | Alina Leidinger | Ekaterina Shutova
Findings of the Association for Computational Linguistics: EMNLP 2023

Large Language Models (LLMs) exhibit impressive performance on a range of NLP tasks, due to the general-purpose linguistic knowledge acquired during pretraining. Existing model interpretability research (Tenney et al., 2019) suggests that a linguistic hierarchy emerges in the LLM layers, with lower layers better suited to solving syntactic tasks and higher layers employed for semantic processing. Yet, little is known about how encodings of different linguistic phenomena interact within the models and to what extent processing of linguistically-related categories relies on the same, shared model representations. In this paper, we propose a framework for testing the joint encoding of linguistic categories in LLMs. Focusing on syntax, we find evidence of joint encoding both at the same (related part-of-speech (POS) classes) and different (POS classes and related syntactic dependency relations) levels of linguistic hierarchy. Our cross-lingual experiments show that the same patterns hold across languages in multilingual LLMs.

The language of prompting: What linguistic properties make a prompt successful?
Alina Leidinger | Robert van Rooij | Ekaterina Shutova
Findings of the Association for Computational Linguistics: EMNLP 2023

The latest generation of LLMs can be prompted to achieve impressive zero-shot or few-shot performance in many NLP tasks. However, since performance is highly sensitive to the choice of prompts, considerable effort has been devoted to crowd-sourcing prompts or designing methods for prompt optimisation. Yet, we still lack a systematic understanding of how linguistic properties of prompts correlate with the task performance. In this work, we investigate how LLMs of different sizes, pre-trained and instruction-tuned, perform on prompts that are semantically equivalent, but vary in linguistic structure. We investigate both grammatical properties such as mood, tense, aspect and modality, as well as lexico-semantic variation through the use of synonyms. Our findings contradict the common assumption that LLMs achieve optimal performance on prompts which reflect language use in pretraining or instruction-tuning data. Prompts transfer poorly between datasets or models, and performance cannot generally be explained by perplexity, word frequency, word sense ambiguity or prompt length. Based on our results, we put forward a proposal for a more robust and comprehensive evaluation standard for prompting research.

2022

Investigating Language Relationships in Multilingual Sentence Encoders Through the Lens of Linguistic Typology
Rochelle Choenni | Ekaterina Shutova
Computational Linguistics, Volume 48, Issue 3 - September 2022

Multilingual sentence encoders have seen much success in cross-lingual model transfer for downstream NLP tasks. The success of this transfer is, however, dependent on the model’s ability to encode the patterns of cross-lingual similarity and variation. Yet, we know relatively little about the properties of individual languages or the general patterns of linguistic variation that the models encode. In this article, we investigate these questions by leveraging knowledge from the field of linguistic typology, which studies and documents structural and semantic variation across languages. We propose methods for separating language-specific subspaces within state-of-the-art multilingual sentence encoders (LASER, M-BERT, XLM, and XLM-R) with respect to a range of typological properties pertaining to lexical, morphological, and syntactic structure. Moreover, we investigate how typological information about languages is distributed across all layers of the models. Our results show interesting differences in encoding linguistic variation associated with different pretraining strategies. In addition, we propose a simple method to study how shared typological properties of languages are encoded in two state-of-the-art multilingual models—M-BERT and XLM-R. The results provide insight into their information-sharing mechanisms and suggest that these linguistic properties are encoded jointly across typologically similar languages in these models.

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Wanxiang Che | Ekaterina Shutova
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Meta-Learning for Fast Cross-Lingual Adaptation in Dependency Parsing
Anna Langedijk | Verna Dankers | Phillip Lippe | Sander Bos | Bryan Cardenas Guevara | Helen Yannakoudakis | Ekaterina Shutova
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Meta-learning, or learning to learn, is a technique that can help to overcome resource scarcity in cross-lingual NLP problems, by enabling fast adaptation to new tasks. We apply model-agnostic meta-learning (MAML) to the task of cross-lingual dependency parsing. We train our model on a diverse set of languages to learn a parameter initialization that can adapt quickly to new languages. We find that meta-learning with pre-training can significantly improve upon the performance of language transfer and standard supervised learning baselines for a variety of unseen, typologically diverse, and low-resource languages, in a few-shot learning setup.

Scientific and Creative Analogies in Pretrained Language Models
Tamara Czinczoll | Helen Yannakoudakis | Pushkar Mishra | Ekaterina Shutova
Findings of the Association for Computational Linguistics: EMNLP 2022

This paper examines the encoding of analogy in large-scale pretrained language models, such as BERT and GPT-2. Existing analogy datasets typically focus on a limited set of analogical relations, with a high similarity of the two domains between which the analogy holds. As a more realistic setup, we introduce the Scientific and Creative Analogy dataset (SCAN), a novel analogy dataset containing systematic mappings of multiple attributes and relational structures across dissimilar domains. Using this dataset, we test the analogical reasoning capabilities of several widely-used pretrained language models (LMs). We find that state-of-the-art LMs achieve low performance on these complex analogy tasks, highlighting the challenges still posed by analogy understanding.

2021

Multilingual and cross-lingual document classification: A meta-learning approach
Niels van der Heijden | Helen Yannakoudakis | Pushkar Mishra | Ekaterina Shutova
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The great majority of languages in the world are considered under-resourced for successful application of deep learning methods. In this work, we propose a meta-learning approach to document classification in low-resource languages and demonstrate its effectiveness in two different settings: few-shot, cross-lingual adaptation to previously unseen languages; and multilingual joint-training when limited target-language data is available during trai-ing. We conduct a systematic comparison of several meta-learning methods, investigate multiple settings in terms of data availability, and show that meta-learning thrives in settings with a heterogeneous task distribution. We propose a simple, yet effective adjustment to existing meta-learning methods which allows for better and more stable learning, and set a new state-of-the-art on a number of languages while performing on-par on others, using only a small amount of labeled data.

Meta-Learning with Variational Semantic Memory for Word Sense Disambiguation
Yingjun Du | Nithin Holla | Xiantong Zhen | Cees Snoek | Ekaterina Shutova
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

A critical challenge faced by supervised word sense disambiguation (WSD) is the lack of large annotated datasets with sufficient coverage of words in their diversity of senses. This inspired recent research on few-shot WSD using meta-learning. While such work has successfully applied meta-learning to learn new word senses from very few examples, its performance still lags behind its fully-supervised counterpart. Aiming to further close this gap, we propose a model of semantic memory for WSD in a meta-learning setting. Semantic memory encapsulates prior experiences seen throughout the lifetime of the model, which aids better generalization in limited data settings. Our model is based on hierarchical variational inference and incorporates an adaptive memory update rule via a hypernetwork. We show our model advances the state of the art in few-shot WSD, supports effective learning in extremely data scarce (e.g. one-shot) scenarios and produces meaning prototypes that capture similar senses of distinct words.

Ruddit: Norms of Offensiveness for English Reddit Comments
Rishav Hada | Sohi Sudhir | Pushkar Mishra | Helen Yannakoudakis | Saif M. Mohammad | Ekaterina Shutova
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

On social media platforms, hateful and offensive language negatively impact the mental well-being of users and the participation of people from diverse backgrounds. Automatic methods to detect offensive language have largely relied on datasets with categorical labels. However, comments can vary in their degree of offensiveness. We create the first dataset of English language Reddit comments that has fine-grained, real-valued scores between -1 (maximally supportive) and 1 (maximally offensive). The dataset was annotated using Best–Worst Scaling, a form of comparative annotation that has been shown to alleviate known biases of using rating scales. We show that the method produces highly reliable offensiveness scores. Finally, we evaluate the ability of widely-used neural models to predict offensiveness scores on this new dataset.

Recent advances in neural metaphor processing: A linguistic, cognitive and social perspective
Xiaoyu Tong | Ekaterina Shutova | Martha Lewis
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Metaphor is an indispensable part of human cognition and everyday communication. Much research has been conducted elucidating metaphor processing in the mind/brain and the role it plays in communication. in recent years, metaphor processing systems have benefited greatly from these studies, as well as the rapid advances in deep learning for natural language processing (NLP). This paper provides a comprehensive review and discussion of recent developments in automated metaphor processing, in light of the findings about metaphor in the mind, language, and communication, and from the perspective of downstream NLP tasks.

Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?
Rochelle Choenni | Ekaterina Shutova | Robert van Rooij
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper, we investigate what types of stereotypical information are captured by pretrained language models. We present the first dataset comprising stereotypical attributes of a range of social groups and propose a method to elicit stereotypes encoded by pretrained language models in an unsupervised fashion. Moreover, we link the emergent stereotypes to their manifestation as basic emotions as a means to study their emotional effects in a more generalized manner. To demonstrate how our methods can be used to analyze emotion and stereotype shifts due to linguistic experience, we use fine-tuning on news sources as a case study. Our experiments expose how attitudes towards different social groups vary across models and how quickly emotions and stereotypes can shift at the fine-tuning stage.

Modeling Users and Online Communities for Abuse Detection: A Position on Ethics and Explainability
Pushkar Mishra | Helen Yannakoudakis | Ekaterina Shutova
Findings of the Association for Computational Linguistics: EMNLP 2021

Abuse on the Internet is an important societal problem of our time. Millions of Internet users face harassment, racism, personal attacks, and other types of abuse across various platforms. The psychological effects of abuse on individuals can be profound and lasting. Consequently, over the past few years, there has been a substantial research effort towards automated abusive language detection in the field of NLP. In this position paper, we discuss the role that modeling of users and online communities plays in abuse detection. Specifically, we review and analyze the state of the art methods that leverage user or community information to enhance the understanding and detection of abusive language. We then explore the ethical challenges of incorporating user and community information, laying out considerations to guide future research. Finally, we address the topic of explainability in abusive language detection, proposing properties that an explainable method should aim to exhibit. We describe how user and community information can facilitate the realization of these properties and discuss the effective operationalization of explainability in view of the properties.

Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions
Pere-Lluís Huguet Cabot | David Abadi | Agneta Fischer | Ekaterina Shutova
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Computational modelling of political discourse tasks has become an increasingly important area of research in the field of natural language processing. Populist rhetoric has risen across the political sphere in recent years; however, due to its complex nature, computational approaches to it have been scarce. In this paper, we present the new Us vs. Them dataset, consisting of 6861 Reddit comments annotated for populist attitudes and the first large-scale computational models of this phenomenon. We investigate the relationship between populist mindsets and social groups, as well as a range of emotions typically associated with these. We set a baseline for two tasks associated with populist attitudes and present a set of multi-task learning models that leverage and demonstrate the importance of emotion and group identification as auxiliary tasks.

2020

The Pragmatics behind Politics: Modelling Metaphor, Framing and Emotion in Political Discourse
Pere-Lluís Huguet Cabot | Verna Dankers | David Abadi | Agneta Fischer | Ekaterina Shutova
Findings of the Association for Computational Linguistics: EMNLP 2020

There has been an increased interest in modelling political discourse within the natural language processing (NLP) community, in tasks such as political bias and misinformation detection, among others. Metaphor-rich and emotion-eliciting communication strategies are ubiquitous in political rhetoric, according to social science research. Yet, none of the existing computational models of political discourse has incorporated these phenomena. In this paper, we present the first joint models of metaphor, emotion and political rhetoric, and demonstrate that they advance performance in three tasks: predicting political perspective of news articles, party affiliation of politicians and framing of policy issues.

Proceedings of the Second Workshop on Figurative Language Processing
Beata Beigman Klebanov | Ekaterina Shutova | Patricia Lichtenstein | Smaranda Muresan | Chee Wee | Anna Feldman | Debanjan Ghosh
Proceedings of the Second Workshop on Figurative Language Processing

Decoding Brain Activity Associated with Literal and Metaphoric Sentence Comprehension Using Distributional Semantic Models
Vesna G. Djokic | Jean Maillard | Luana Bulat | Ekaterina Shutova
Transactions of the Association for Computational Linguistics, Volume 8

Recent years have seen a growing interest within the natural language processing (NLP) community in evaluating the ability of semantic models to capture human meaning representation in the brain. Existing research has mainly focused on applying semantic models to decode brain activity patterns associated with the meaning of individual words, and, more recently, this approach has been extended to sentences and larger text fragments. Our work is the first to investigate metaphor processing in the brain in this context. We evaluate a range of semantic models (word embeddings, compositional, and visual models) in their ability to decode brain activity associated with reading of both literal and metaphoric sentences. Our results suggest that compositional models and word embeddings are able to capture differences in the processing of literal and metaphoric sentences, providing support for the idea that the literal meaning is not fully accessible during familiar metaphor comprehension.

Joint Modelling of Emotion and Abusive Language Detection
Santhosh Rajamanickam | Pushkar Mishra | Helen Yannakoudakis | Ekaterina Shutova
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The rise of online communication platforms has been accompanied by some undesirable effects, such as the proliferation of aggressive and abusive behaviour online. Aiming to tackle this problem, the natural language processing (NLP) community has experimented with a range of techniques for abuse detection. While achieving substantial success, these methods have so far only focused on modelling the linguistic properties of the comments and the online communities of users, disregarding the emotional state of the users and how this might affect their language. The latter is, however, inextricably linked to abusive behaviour. In this paper, we present the first joint model of emotion and abusive language detection, experimenting in a multi-task learning framework that allows one task to inform the other. Our results demonstrate that incorporating affective features leads to significant improvements in abuse detection performance across datasets.

Proceedings of the Fourteenth Workshop on Semantic Evaluation
Aurelie Herbelot | Xiaodan Zhu | Alexis Palmer | Nathan Schneider | Jonathan May | Ekaterina Shutova
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation
Nithin Holla | Pushkar Mishra | Helen Yannakoudakis | Ekaterina Shutova
Findings of the Association for Computational Linguistics: EMNLP 2020

The success of deep learning methods hinges on the availability of large training datasets annotated for the task of interest. In contrast to human intelligence, these methods lack versatility and struggle to learn and adapt quickly to new tasks, where labeled data is scarce. Meta-learning aims to solve this problem by training a model on a large number of few-shot tasks, with an objective to learn new tasks quickly from a small number of examples. In this paper, we propose a meta-learning framework for few-shot word sense disambiguation (WSD), where the goal is to learn to disambiguate unseen words from only a few labeled instances. Meta-learning approaches have so far been typically tested in an N-way, K-shot classification setting where each task has N classes with K examples per class. Owing to its nature, WSD deviates from this controlled setup and requires the models to handle a large number of highly unbalanced classes. We extend several popular meta-learning approaches to this scenario, and analyze their strengths and weaknesses in this new challenging setting.

Being neighbourly: Neural metaphor identification in discourse
Verna Dankers | Karan Malhotra | Gaurav Kudva | Volodymyr Medentsiy | Ekaterina Shutova
Proceedings of the Second Workshop on Figurative Language Processing

Existing approaches to metaphor processing typically rely on local features, such as immediate lexico-syntactic contexts or information within a given sentence. However, a large body of corpus-linguistic research suggests that situational information and broader discourse properties influence metaphor production and comprehension. In this paper, we present the first neural metaphor processing architecture that models a broader discourse through the use of attention mechanisms. Our models advance the state of the art on the all POS track of the 2018 VU Amsterdam metaphor identification task. The inclusion of discourse-level information yields further significant improvements.

2019

Proceedings of the 13th International Workshop on Semantic Evaluation
Jonathan May | Ekaterina Shutova | Aurelie Herbelot | Xiaodan Zhu | Marianna Apidianaki | Saif M. Mohammad
Proceedings of the 13th International Workshop on Semantic Evaluation

Deconstructing multimodality: visual properties and visual context in human semantic processing
Christopher Davis | Luana Bulat | Anita Lilla Vero | Ekaterina Shutova
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Multimodal semantic models that extend linguistic representations with additional perceptual input have proved successful in a range of natural language processing (NLP) tasks. Recent research has successfully used neural methods to automatically create visual representations for words. However, these works have extracted visual features from complete images, and have not examined how different kinds of visual information impact performance. In contrast, we construct multimodal models that differentiate between internal visual properties of the objects and their external visual context. We evaluate the models on the task of decoding brain activity associated with the meanings of nouns, demonstrating their advantage over those based on complete images.

Modelling the interplay of metaphor and emotion through multitask learning
Verna Dankers | Marek Rei | Martha Lewis | Ekaterina Shutova
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Metaphors allow us to convey emotion by connecting physical experiences and abstract concepts. The results of previous research in linguistics and psychology suggest that metaphorical phrases tend to be more emotionally evocative than their literal counterparts. In this paper, we investigate the relationship between metaphor and emotion within a computational framework, by proposing the first joint model of these phenomena. We experiment with several multitask learning architectures for this purpose, involving both hard and soft parameter sharing. Our results demonstrate that metaphor identification and emotion prediction mutually benefit from joint learning and our models advance the state of the art in both of these tasks.

Abusive Language Detection with Graph Convolutional Networks
Pushkar Mishra | Marco Del Tredici | Helen Yannakoudakis | Ekaterina Shutova
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Abuse on the Internet represents a significant societal problem of our time. Previous research on automated abusive language detection in Twitter has shown that community-based profiling of users is a promising technique for this task. However, existing approaches only capture shallow properties of online communities by modeling follower–following relationships. In contrast, working with graph convolutional networks (GCNs), we present the first approach that captures not only the structure of online communities but also the linguistic behavior of the users within them. We show that such a heterogeneous graph-structured modeling of communities significantly advances the current state of the art in abusive language detection.

Learning Outside the Box: Discourse-level Features Improve Metaphor Identification
Jesse Mu | Helen Yannakoudakis | Ekaterina Shutova
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Most current approaches to metaphor identification use restricted linguistic contexts, e.g. by considering only a verb’s arguments or the sentence containing a phrase. Inspired by pragmatic accounts of metaphor, we argue that broader discourse features are crucial for better metaphor identification. We train simple gradient boosting classifiers on representations of an utterance and its surrounding discourse learned with a variety of document embedding methods, obtaining near state-of-the-art results on the 2018 VU Amsterdam metaphor identification task without the complex metaphor-specific features or deep neural architectures employed by other systems. A qualitative analysis further confirms the need for broader context in metaphor processing.

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Edoardo Maria Ponti | Helen O’Horan | Yevgeni Berzak | Ivan Vulić | Roi Reichart | Thierry Poibeau | Ekaterina Shutova | Anna Korhonen
Computational Linguistics, Volume 45, Issue 3 - September 2019

Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.

Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)
Rada Mihalcea | Ekaterina Shutova | Lun-Wei Ku | Kilian Evang | Soujanya Poria
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

CAMsterdam at SemEval-2019 Task 6: Neural and graph-based feature extraction for the identification of offensive tweets
Guy Aglionby | Chris Davis | Pushkar Mishra | Andrew Caines | Helen Yannakoudakis | Marek Rei | Ekaterina Shutova | Paula Buttery
Proceedings of the 13th International Workshop on Semantic Evaluation

We describe the CAMsterdam team entry to the SemEval-2019 Shared Task 6 on offensive language identification in Twitter data. Our proposed model learns to extract textual features using a multi-layer recurrent network, and then performs text classification using gradient-boosted decision trees (GBDT). A self-attention architecture enables the model to focus on the most relevant areas in the text. In order to enrich input representations, we use node2vec to learn globally optimised embeddings for hashtags, which are then given as additional features to the GBDT classifier. Our best model obtains 78.79% macro F1-score on detecting offensive language (subtask A), 66.32% on categorising offence types (targeted/untargeted; subtask B), and 55.36% on identifying the target of offence (subtask C).

Modeling Affirmative and Negated Action Processing in the Brain with Lexical and Compositional Semantic Models
Vesna Djokic | Jean Maillard | Luana Bulat | Ekaterina Shutova
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recent work shows that distributional semantic models can be used to decode patterns of brain activity associated with individual words and sentence meanings. However, it is yet unclear to what extent such models can be used to study and decode fMRI patterns associated with specific aspects of semantic composition such as the negation function. In this paper, we apply lexical and compositional semantic models to decode fMRI patterns associated with negated and affirmative sentences containing hand-action verbs. Our results show reduced decoding (correlation) of sentences where the verb is in the negated context, as compared to the affirmative one, within brain regions implicated in action-semantic processing. This supports behavioral and brain imaging studies, suggesting that negation involves reduced access to aspects of the affirmative mental representation. The results pave the way for testing alternate semantic models of negation against human semantic processing in the brain.

2018

Proceedings of the Workshop on Figurative Language Processing
Beata Beigman Klebanov | Ekaterina Shutova | Patricia Lichtenstein | Smaranda Muresan | Chee Wee
Proceedings of the Workshop on Figurative Language Processing

A Report on the 2018 VUA Metaphor Detection Shared Task
Chee Wee (Ben) Leong | Beata Beigman Klebanov | Ekaterina Shutova
Proceedings of the Workshop on Figurative Language Processing

As the community working on computational approaches to figurative language is growing and as methods and data become increasingly diverse, it is important to create widely shared empirical knowledge of the level of system performance in a range of contexts, thus facilitating progress in this area. One way of creating such shared knowledge is through benchmarking multiple systems on a common dataset. We report on the shared task on metaphor identification on the VU Amsterdam Metaphor Corpus conducted at the NAACL 2018 Workshop on Figurative Language Processing.

Proceedings of the 12th International Workshop on Semantic Evaluation
Marianna Apidianaki | Saif M. Mohammad | Jonathan May | Ekaterina Shutova | Steven Bethard | Marine Carpuat
Proceedings of the 12th International Workshop on Semantic Evaluation

Author Profiling for Abuse Detection
Pushkar Mishra | Marco Del Tredici | Helen Yannakoudakis | Ekaterina Shutova
Proceedings of the 27th International Conference on Computational Linguistics

The rapid growth of social media in recent years has fed into some highly undesirable phenomena such as proliferation of hateful and offensive language on the Internet. Previous research suggests that such abusive content tends to come from users who share a set of common stereotypes and form communities around them. The current state-of-the-art approaches to abuse detection are oblivious to user and community information and rely entirely on textual (i.e., lexical and semantic) cues. In this paper, we propose a novel approach to this problem that incorporates community-based profiling features of Twitter users. Experimenting with a dataset of 16k tweets, we show that our methods significantly outperform the current state of the art in abuse detection. Further, we conduct a qualitative analysis of model characteristics. We release our code, pre-trained models and all the resources used in the public domain.

Neural Character-based Composition Models for Abuse Detection
Pushkar Mishra | Helen Yannakoudakis | Ekaterina Shutova
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)

The advent of social media in recent years has fed into some highly undesirable phenomena such as proliferation of offensive language, hate speech, sexist remarks, etc. on the Internet. In light of this, there have been several efforts to automate the detection and moderation of such abusive content. However, deliberate obfuscation of words by users to evade detection poses a serious challenge to the effectiveness of these efforts. The current state of the art approaches to abusive language detection, based on recurrent neural networks, do not explicitly address this problem and resort to a generic OOV (out of vocabulary) embedding for unseen words. However, in using a single embedding for all unseen words we lose the ability to distinguish between obfuscated and non-obfuscated or rare words. In this paper, we address this problem by designing a model that can compose embeddings for unseen words. We experimentally demonstrate that our approach significantly advances the current state of the art in abuse detection on datasets from two different domains, namely Twitter and Wikipedia talk page.

2017

Speaking, Seeing, Understanding: Correlating semantic models with conceptual representation in the brain
Luana Bulat | Stephen Clark | Ekaterina Shutova
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Research in computational semantics is increasingly guided by our understanding of human semantic processing. However, semantic models are typically studied in the context of natural language processing system performance. In this paper, we present a systematic evaluation and comparison of a range of widely-used, state-of-the-art semantic models in their ability to predict patterns of conceptual representation in the human brain. Our results provide new insights both for the design of computational semantic models and for further research in cognitive neuroscience.

Multilingual Metaphor Processing: Experiments with Semi-Supervised and Unsupervised Learning
Ekaterina Shutova | Lin Sun | Elkin Darío Gutiérrez | Patricia Lichtenstein | Srini Narayanan
Computational Linguistics, Volume 43, Issue 1 - April 2017

Highly frequent in language and communication, metaphor represents a significant challenge for Natural Language Processing (NLP) applications. Computational work on metaphor has traditionally evolved around the use of hand-coded knowledge, making the systems hard to scale. Recent years have witnessed a rise in statistical approaches to metaphor processing. However, these approaches often require extensive human annotation effort and are predominantly evaluated within a limited domain. In contrast, we experiment with weakly supervised and unsupervised techniques—with little or no annotation—to generalize higher-level mechanisms of metaphor from distributional properties of concepts. We investigate different levels and types of supervision (learning from linguistic examples vs. learning from a given set of metaphorical mappings vs. learning without annotation) in flat and hierarchical, unconstrained and constrained clustering settings. Our aim is to identify the optimal type of supervision for a learning algorithm that discovers patterns of metaphorical association from text. In order to investigate the scalability and adaptability of our models, we applied them to data in three languages from different language groups—English, Spanish, and Russian—achieving state-of-the-art results with little supervision. Finally, we demonstrate that statistical methods can facilitate and scale up cross-linguistic research on metaphor.

Modelling metaphor with attribute-based semantics
Luana Bulat | Stephen Clark | Ekaterina Shutova
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

One of the key problems in computational metaphor modelling is finding the optimal level of abstraction of semantic representations, such that these are able to capture and generalise metaphorical mechanisms. In this paper we present the first metaphor identification method that uses representations constructed from property norms. Such norms have been previously shown to provide a cognitively plausible representation of concepts in terms of semantic properties. Our results demonstrate that such property-based semantic representations provide a suitable model of cross-domain knowledge projection in metaphors, outperforming standard distributional models on a metaphor identification task.

Modelling semantic acquisition in second language learning
Ekaterina Kochmar | Ekaterina Shutova
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Using methods of statistical analysis, we investigate how semantic knowledge is acquired in English as a second language and evaluate the pace of development across a number of predicate types and content word combinations, as well as across the levels of language proficiency and native languages. Our exploratory study helps identify the most problematic areas for language learners with different backgrounds and at different stages of learning.

Semantic Frames and Visual Scenes: Learning Semantic Role Inventories from Image and Video Descriptions
Ekaterina Shutova | Andreas Wundsam | Helen Yannakoudakis
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Frame-semantic parsing and semantic role labelling, that aim to automatically assign semantic roles to arguments of verbs in a sentence, have become an active strand of research in NLP. However, to date these methods have relied on a predefined inventory of semantic roles. In this paper, we present a method to automatically learn argument role inventories for verbs from large corpora of text, images and videos. We evaluate the method against manually constructed role inventories in FrameNet and show that the visual model outperforms the language-only model and operates with a high precision.

Grasping the Finer Point: A Supervised Similarity Network for Metaphor Detection
Marek Rei | Luana Bulat | Douwe Kiela | Ekaterina Shutova
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

The ubiquity of metaphor in our everyday communication makes it an important problem for natural language understanding. Yet, the majority of metaphor processing systems to date rely on hand-engineered features and there is still no consensus in the field as to which features are optimal for this task. In this paper, we present the first deep learning architecture designed to capture metaphorical composition. Our results demonstrate that it outperforms the existing approaches in the metaphor identification task.

2016

Semantic classifications for detection of verb metaphors
Beata Beigman Klebanov | Chee Wee Leong | E. Dario Gutierrez | Ekaterina Shutova | Michael Flor
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Metaphor as a Medium for Emotion: An Empirical Study
Saif Mohammad | Ekaterina Shutova | Peter Turney
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

Literal and Metaphorical Senses in Compositional Distributional Semantic Models
E. Dario Gutiérrez | Ekaterina Shutova | Tyler Marghetis | Benjamin Bergen
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Black Holes and White Rabbits: Metaphor Identification with Visual Features
Ekaterina Shutova | Douwe Kiela | Jean Maillard
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Detecting Cross-Cultural Differences Using a Multilingual Topic Model
E.D. Gutiérrez | Ekaterina Shutova | Patricia Lichtenstein | Gerard de Melo | Luca Gilardi
Transactions of the Association for Computational Linguistics, Volume 4

Understanding cross-cultural differences has important implications for world affairs and many aspects of the life of society. Yet, the majority of text-mining methods to date focus on the analysis of monolingual texts. In contrast, we present a statistical model that simultaneously learns a set of common topics from multilingual, non-parallel data and automatically discovers the differences in perspectives on these topics across linguistic communities. We perform a behavioural evaluation of a subset of the differences identified by our model in English and Spanish to investigate their psychological validity.

Cross-Lingual Lexico-Semantic Transfer in Language Learning
Ekaterina Kochmar | Ekaterina Shutova
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Proceedings of the Fourth Workshop on Metaphor in NLP
Beata Beigman Klebanov | Ekaterina Shutova | Patricia Lichtenstein
Proceedings of the Fourth Workshop on Metaphor in NLP

2015

Proceedings of the Third Workshop on Metaphor in NLP
Ekaterina Shutova | Beata Beigman Klebanov | Patricia Lichtenstein
Proceedings of the Third Workshop on Metaphor in NLP

Design and Evaluation of Metaphor Processing Systems
Ekaterina Shutova
Computational Linguistics, Volume 41, Issue 4 - December 2015

SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter
Aniruddha Ghosh | Guofu Li | Tony Veale | Paolo Rosso | Ekaterina Shutova | John Barnden | Antonio Reyes
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

Perceptually Grounded Selectional Preferences
Ekaterina Shutova | Niket Tandon | Gerard de Melo
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

Proceedings of the Second Workshop on Metaphor in NLP
Beata Beigman Klebanov | Ekaterina Shutova | Patricia Lichtenstein
Proceedings of the Second Workshop on Metaphor in NLP

2013

Metaphor Identification as Interpretation
Ekaterina Shutova
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

Statistical Metaphor Processing
Ekaterina Shutova | Simone Teufel | Anna Korhonen
Computational Linguistics, Volume 39, Issue 2 - June 2013

Proceedings of the First Workshop on Metaphor in NLP
Ekaterina Shutova | Beata Beigman Klebanov | Joel Tetreault | Zornitsa Kozareva
Proceedings of the First Workshop on Metaphor in NLP

Unsupervised Metaphor Identification Using Hierarchical Graph Factorization Clustering
Ekaterina Shutova | Lin Sun
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

Unsupervised Metaphor Paraphrasing using a Vector Space Model
Ekaterina Shutova | Tim Van de Cruys | Anna Korhonen
Proceedings of COLING 2012: Posters

2010

Models of Metaphor in NLP
Ekaterina Shutova
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Metaphor Corpus Annotated for Source - Target Domain Mappings
Ekaterina Shutova | Simone Teufel
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Besides making our thoughts more vivid and filling our communication with richer imagery, metaphor also plays an important structural role in our cognition. Although there is a consensus in the linguistics and NLP research communities that the phenomenon of metaphor is not restricted to similarity-based extensions of meanings of isolated words, but rather involves reconceptualization of a whole area of experience (target domain) in terms of another (source domain), there still has been no proposal for a comprehensive procedure for annotation of cross-domain mappings. However, a corpus annotated for conceptual mappings could provide a new starting point for both linguistic and cognitive experiments. The annotation scheme we present in this paper is a step towards filling this gap. We test our procedure in an experimental setting involving multiple annotators and estimate their agreement on the task. The associated corpus annotated for source ― target domain mappings will be publicly available.

Automatic Metaphor Interpretation as a Paraphrasing Task
Ekaterina Shutova
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Metaphor Identification Using Verb and Noun Clustering
Ekaterina Shutova | Lin Sun | Anna Korhonen
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

Sense-based Interpretation of Logical Metonymy Using a Statistical Method
Ekaterina Shutova
Proceedings of the ACL-IJCNLP 2009 Student Research Workshop

Co-authors

Verna Dankers 5

Anna Korhonen 4

Saif Mohammad 4

E. Dario Gutierrez 3

Alina Leidinger 3

Jean Maillard 3

Joyce Nabende 3

Mohammad Taher Pilehvar 3

Robert Van Rooij 3

Marianna Apidianaki 2

Stephen Clark 2

Gerard De Melo 2

Agneta Fischer 2

Aurélie Herbelot 2

Daniel Hershcovich 2

‪Pere-Lluís Huguet Cabot 2

Ekaterina Kochmar 2

Chee Wee Leong 2

Christof Monz 2

Smaranda Muresan 2

Simone Teufel 2

Marco Del Tredici 2

Xiantong Zhen 2

Niels van der Heijden 2

Daniel Baleato Rodríguez 1

Benjamin Bergen 1

Yevgeni Berzak 1

Steven Bethard 1

Paula Buttery 1

Andrew Caines 1

Bryan Cardenas Guevara 1

Marine Carpuat 1

Tamara Czinczoll 1

Christopher Davis 1

Chris Irwin Davis 1

Thierry Declerck 1

Vesna G. Djokic 1

Debanjan Ghosh 1

Aniruddha Ghosh 1

E.D. Gutiérrez 1

Tamar Johnson 1

Alexander Koller 1

Zornitsa Kozareva 1

Anna Langedijk 1

Anne Lauscher 1

Phillip Lippe 1

Karan Malhotra 1

Tyler Marghetis 1

Volodymyr Medentsiy 1

Rada Mihalcea 1

Preslav Nakov 1

Srini Narayanan 1

Roberto Navigli 1

Helen O’Horan 1

Alexis Palmer 1

Apostolos Panagiotopoulos 1

Konstantinos Papakostas 1

Thierry Poibeau 1

Edoardo Maria Ponti 1

Soujanya Poria 1

Santhosh Rajamanickam 1

Antonio Reyes 1

Matteo Rosati 1

Nathan Schneider 1

Steven Schockaert 1

Rico Sennrich 1

Giulio Starace 1

Claire E Stevenson 1

Simone Tedeschi 1

Joel Tetreault 1

Tim Van de Cruys 1

Ivo Verhoeven 1

Anita Lilla Verő 1

Andreas Wundsam 1

Srishti Yadav 1

Mathilde ter Veen 1

Han van der Maas 1

Venues