Serra Sinem Tekiroğlu

Also published as: Serra Sinem Tekiroglu


pdf bib
Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech
Yi-Ling Chung | Serra Sinem Tekiroğlu | Marco Guerini
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech
Margherita Fanton | Helena Bonaldi | Serra Sinem Tekiroğlu | Marco Guerini
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation, they fall short in reaching either high-quality and/or high-quantity. In this paper, we propose a novel human-in-the-loop data collection methodology in which a generative language model is refined iteratively by using its own data from the previous loops to generate new training samples that experts review and/or post-edit. Our experiments comprised several loops including diverse dynamic variations. Results show that the methodology is scalable and facilitates diverse, novel, and cost-effective data collection. To our knowledge, the resulting dataset is the only expert-based multi-target HS/CN dataset available to the community.


pdf bib
Generating Counter Narratives against Online Hate Speech: Data and Strategies
Serra Sinem Tekiroğlu | Yi-Ling Chung | Marco Guerini
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recently research has started focusing on avoiding undesired effects that come with content moderation, such as censorship and overblocking, when dealing with hatred online. The core idea is to directly intervene in the discussion with textual responses that are meant to counter the hate content and prevent it from further spreading. Accordingly, automation strategies, such as natural language generation, are beginning to be investigated. Still, they suffer from the lack of sufficient amount of quality data and tend to produce generic/repetitive responses. Being aware of the aforementioned limitations, we present a study on how to collect responses to hate effectively, employing large scale unsupervised language models such as GPT-2 for the generation of silver data, and the best annotation strategies/neural architectures that can be used for data filtering before expert validation/post-editing.

pdf bib
Toward Stance-based Personas for Opinionated Dialogues
Thomas Scialom | Serra Sinem Tekiroğlu | Jacopo Staiano | Marco Guerini
Findings of the Association for Computational Linguistics: EMNLP 2020

In the context of chit-chat dialogues it has been shown that endowing systems with a persona profile is important to produce more coherent and meaningful conversations. Still, the representation of such personas has thus far been limited to a fact-based representation (e.g. “I have two cats.”). We argue that these representations remain superficial w.r.t. the complexity of human personality. In this work, we propose to make a step forward and investigate stance-based persona, trying to grasp more profound characteristics, such as opinions, values, and beliefs to drive language generation. To this end, we introduce a novel dataset allowing to explore different stance-based persona representations and their impact on claim generation, showing that they are able to grasp abstract and profound aspects of the author persona.


pdf bib
Generating Challenge Datasets for Task-Oriented Conversational Agents through Self-Play
Sourabh Majumdar | Serra Sinem Tekiroglu | Marco Guerini
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

End-to-end neural approaches are becoming increasingly common in conversational scenarios due to their promising performances when provided with sufficient amount of data. In this paper, we present a novel methodology to address the interpretability of neural approaches in such scenarios by creating challenge datasets using dialogue self-play over multiple tasks/intents. Dialogue self-play allows generating large amount of synthetic data; by taking advantage of the complete control over the generation process, we show how neural approaches can be evaluated in terms of unseen dialogue patterns. We propose several out-of-pattern test cases each of which introduces a natural and unexpected user utterance phenomenon. As a proof of concept, we built a single and a multiple memory network, and show that these two architectures have diverse performances depending on the peculiar dialogue patterns.

pdf bib
CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech
Yi-Ling Chung | Elizaveta Kuzmenko | Serra Sinem Tekiroglu | Marco Guerini
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Although there is an unprecedented effort to provide adequate responses in terms of laws and policies to hate content on social media platforms, dealing with hatred online is still a tough problem. Tackling hate speech in the standard way of content deletion or user suspension may be charged with censorship and overblocking. One alternate strategy, that has received little attention so far by the research community, is to actually oppose hate content with counter-narratives (i.e. informed textual responses). In this paper, we describe the creation of the first large-scale, multilingual, expert-based dataset of hate-speech/counter-narrative pairs. This dataset has been built with the effort of more than 100 operators from three different NGOs that applied their training and expertise to the task. Together with the collected data we also provide additional annotations about expert demographics, hate and response type, and data augmentation through translation and paraphrasing. Finally, we provide initial experiments to assess the quality of our data.

pdf bib
FASTDial: Abstracting Dialogue Policies for Fast Development of Task Oriented Agents
Serra Sinem Tekiroglu | Bernardo Magnini | Marco Guerini
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present a novel abstraction framework called FASTDial for designing task oriented dialogue agents, built on top of the OpenDial toolkit. This framework is meant to facilitate prototyping and development of dialogue systems from scratch also by non tech savvy especially when limited training data is available. To this end, we use a generic and simple frame-slots data-structure with pre-defined dialogue policies that allows for fast design and implementation at the price of some flexibility reduction. Moreover, it allows for minimizing programming effort and domain expert training time, by hiding away many implementation details. We provide a system demonstration screencast video in the following link:


pdf bib
A Computational Exploration of Exaggeration
Enrica Troiano | Carlo Strapparava | Gözde Özbal | Serra Sinem Tekiroğlu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Several NLP studies address the problem of figurative language, but among non-literal phenomena, they have neglected exaggeration. This paper presents a first computational approach to this figure of speech. We explore the possibility to automatically detect exaggerated sentences. First, we introduce HYPO, a corpus containing overstatements (or hyperboles) collected on the web and validated via crowdsourcing. Then, we evaluate a number of models trained on HYPO, and bring evidence that the task of hyperbole identification can be successfully performed based on a small set of semantic features.


pdf bib
Learning to Identify Metaphors from a Corpus of Proverbs
Gözde Özbal | Carlo Strapparava | Serra Sinem Tekiroğlu | Daniele Pighin
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
PROMETHEUS: A Corpus of Proverbs Annotated with Metaphors
Gözde Özbal | Carlo Strapparava | Serra Sinem Tekiroğlu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Proverbs are commonly metaphoric in nature and the mapping across domains is commonly established in proverbs. The abundance of proverbs in terms of metaphors makes them an extremely valuable linguistic resource since they can be utilized as a gold standard for various metaphor related linguistic tasks such as metaphor identification or interpretation. Besides, a collection of proverbs fromvarious languages annotated with metaphors would also be essential for social scientists to explore the cultural differences betweenthose languages. In this paper, we introduce PROMETHEUS, a dataset consisting of English proverbs and their equivalents in Italian.In addition to the word-level metaphor annotations for each proverb, PROMETHEUS contains other types of information such as the metaphoricity degree of the overall proverb, its meaning, the century that it was first recorded in and a pair of subjective questions responded by the annotators. To the best of our knowledge, this is the first multi-lingual and open-domain corpus of proverbs annotated with word-level metaphors.


pdf bib
Exploring Sensorial Features for Metaphor Identification
Serra Sinem Tekiroğlu | Gözde Özbal | Carlo Strapparava
Proceedings of the Third Workshop on Metaphor in NLP


pdf bib
Sensicon: An Automatically Constructed Sensorial Lexicon
Serra Sinem Tekiroğlu | Gözde Özbal | Carlo Strapparava
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
A Computational Approach to Generate a Sensorial Lexicon
Serra Sinem Tekiroğlu | Gözde Özbal | Carlo Strapparava
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)