Leon Derczynski

Also published as: Leon Strømberg-Derczynski


2024

pdf bib
Countering Hateful and Offensive Speech Online - Open Challenges
Leon Derczynski | Marco Guerini | Debora Nozza | Flor Miriam Plaza-del-Arco | Jeffrey Sorensen | Marcos Zampieri
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

In today’s digital age, hate speech and offensive speech online pose a significant challenge to maintaining respectful and inclusive online environments. This tutorial aims to provide attendees with a comprehensive understanding of the field by delving into essential dimensions such as multilingualism, counter-narrative generation, a hands-on session with one of the most popular APIs for detecting hate speech, fairness, and ethics in AI, and the use of recent advanced approaches. In addition, the tutorial aims to foster collaboration and inspire participants to create safer online spaces by detecting and mitigating hate speech.

pdf bib
Combating Security and Privacy Issues in the Era of Large Language Models
Muhao Chen | Chaowei Xiao | Huan Sun | Lei Li | Leon Derczynski | Anima Anandkumar | Fei Wang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts)

This tutorial seeks to provide a systematic summary of risks and vulnerabilities in security, privacy and copyright aspects of large language models (LLMs), and most recent solutions to address those issues. We will discuss a broad thread of studies that try to answer the following questions: (i) How do we unravel the adversarial threats that attackers may leverage in the training time of LLMs, especially those that may exist in recent paradigms of instruction tuning and RLHF processes? (ii) How do we guard the LLMs against malicious attacks in inference time, such as attacks based on backdoors and jailbreaking? (iii) How do we ensure privacy protection of user information and LLM decisions for Language Model as-a-Service (LMaaS)? (iv) How do we protect the copyright of an LLM? (v) How do we detect and prevent cases where personal or confidential information is leaked during LLM training? (vi) How should we make policies to control against improper usage of LLM-generated content? In addition, will conclude the discussions by outlining emergent challenges in security, privacy and reliability of LLMs that deserve timely investigation by the community

2023

pdf bib
TeamAmpa at SemEval-2023 Task 3: Exploring Multilabel and Multilingual RoBERTa Models for Persuasion and Framing Detection
Amalie Pauli | Rafael Sarabia | Leon Derczynski | Ira Assent
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our submission to theSemEval 2023 Task 3 on two subtasks: detectingpersuasion techniques and framing. Bothsubtasks are multi-label classification problems. We present a set of experiments, exploring howto get robust performance across languages usingpre-trained RoBERTa models. We test differentoversampling strategies, a strategy ofadding textual features from predictions obtainedwith related models, and present bothinconclusive and negative results. We achievea robust ranking across languages and subtaskswith our best ranking being nr. 1 for Subtask 3on Spanish.

pdf bib
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng | Eric Alcaide | Quentin Anthony | Alon Albalak | Samuel Arcadinho | Stella Biderman | Huanqi Cao | Xin Cheng | Michael Chung | Leon Derczynski | Xingjian Du | Matteo Grella | Kranthi Gv | Xuzheng He | Haowen Hou | Przemyslaw Kazienko | Jan Kocon | Jiaming Kong | Bartłomiej Koptyra | Hayden Lau | Jiaju Lin | Krishna Sri Ipsit Mantri | Ferdinand Mom | Atsushi Saito | Guangyu Song | Xiangru Tang | Johan Wind | Stanisław Woźniak | Zhenyuan Zhang | Qinghua Zhou | Jian Zhu | Rui-Jie Zhu
Findings of the Association for Computational Linguistics: EMNLP 2023

Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.

pdf bib
Anchoring Fine-tuning of Sentence Transformer with Semantic Label Information for Efficient Truly Few-shot Classification
Amalie Pauli | Leon Derczynski | Ira Assent
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Few-shot classification is a powerful technique, but training requires substantial computing power and data. We propose an efficient method with small model sizes and less training data with only 2-8 training instances per class. Our proposed method, AncSetFit, targets low data scenarios by anchoring the task and label information through sentence embeddings in fine-tuning a Sentence Transformer model. It uses contrastive learning and a triplet loss to enforce training instances of a class to be closest to its own textual semantic label information in the embedding space - and thereby learning to embed different class instances more distinct. AncSetFit obtains strong performance in data-sparse scenarios compared to existing methods across SST-5, Emotion detection, and AG News data, even with just two examples per class.

pdf bib
Efficient Methods for Natural Language Processing: A Survey
Marcos Treviso | Ji-Ung Lee | Tianchu Ji | Betty van Aken | Qingqing Cao | Manuel R. Ciosici | Michael Hassid | Kenneth Heafield | Sara Hooker | Colin Raffel | Pedro H. Martins | André F. T. Martins | Jessica Zosa Forde | Peter Milder | Edwin Simpson | Noam Slonim | Jesse Dodge | Emma Strubell | Niranjan Balasubramanian | Leon Derczynski | Iryna Gurevych | Roy Schwartz
Transactions of the Association for Computational Linguistics, Volume 11

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.

2022

pdf bib
Foreword to NEJLT Volume 8, 2022
Leon Derczynski
Northern European Journal of Language Technology, Volume 8

An introduction to the Northern European Journal of Language Technology in 2022

pdf bib
Handling and Presenting Harmful Text in NLP Research
Hannah Kirk | Abeba Birhane | Bertie Vidgen | Leon Derczynski
Findings of the Association for Computational Linguistics: EMNLP 2022

Text data can pose a risk of harm. However, the risks are not fully understood, and how to handle, present, and discuss harmful text in a safe way remains an unresolved issue in the NLP community. We provide an analytical framework categorising harms on three axes: (1) the harm type (e.g., misinformation, hate speech or racial stereotypes); (2) whether a harm is sought as a feature of the research design if explicitly studying harmful content (e.g., training a hate speech classifier), versus unsought if harmful content is encountered when working on unrelated problems (e.g., language generation or part-of-speech tagging); and (3) who it affects, from people (mis)represented in the data to those handling the data and those publishing on the data. We provide advice for practitioners, with concrete steps for mitigating harm in research and in publication. To assist implementation we introduce HarmCheck – a documentation standard for handling and presenting harmful text in research.

pdf bib
Modelling Persuasion through Misuse of Rhetorical Appeals
Amalie Pauli | Leon Derczynski | Ira Assent
Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI)

It is important to understand how people use words to persuade each other. This helps understand debate, and detect persuasive narratives in regard to e.g. misinformation. While computational modelling of some aspects of persuasion has received some attention, a way to unify and describe the overall phenomenon of when persuasion becomes undesired and problematic, is missing. In this paper, we attempt to address this by proposing a taxonomy of computational persuasion. Drawing upon existing research and resources, this paper shows how to re-frame and re-organise current work into a coherent framework targeting the misuse of rhetorical appeals. As a study to validate these re-framings, we then train and evaluate models of persuasion adapted to our taxonomy. Our results show an application of our taxonomy, and we are able to detecting misuse of rhetorical appeals, finding that these are more often used in misinformative contexts than in true ones.

2021

pdf bib
Northern European Journal of Language Technology, Volume 7
Leon Derczynski
Northern European Journal of Language Technology, Volume 7

pdf bib
Annotating Online Misogyny
Philine Zeinert | Nanna Inie | Leon Derczynski
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Online misogyny, a category of online abusive language, has serious and harmful social consequences. Automatic detection of misogynistic language online, while imperative, poses complicated challenges to both data gathering, data annotation, and bias mitigation, as this type of data is linguistically complex and diverse. This paper makes three contributions in this area: Firstly, we describe the detailed design of our iterative annotation process and codebook. Secondly, we present a comprehensive taxonomy of labels for annotating misogyny in natural written language, and finally, we introduce a high-quality dataset of annotated posts sampled from social media posts.

pdf bib
The Danish Gigaword Corpus
Leon Strømberg-Derczynski | Manuel Ciosici | Rebekah Baglini | Morten H. Christiansen | Jacob Aarup Dalsgaard | Riccardo Fusaroli | Peter Juel Henrichsen | Rasmus Hvingelby | Andreas Kirkedal | Alex Speed Kjeldsen | Claus Ladefoged | Finn Årup Nielsen | Jens Madsen | Malte Lau Petersen | Jonathan Hvithamar Rystrøm | Daniel Varab
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers’ socio-economic status, and Danish dialects.

pdf bib
DanFEVER: claim verification dataset for Danish
Jeppe Nørregaard | Leon Derczynski
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

We present a dataset, DanFEVER, intended for multilingual misinformation research. The dataset is in Danish and has the same format as the well-known English FEVER dataset. It can be used for testing methods in multilingual settings, as well as for creating models in production for the Danish language.

pdf bib
Discriminating Between Similar Nordic Languages
René Haas | Leon Derczynski
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects

Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.

pdf bib
Hyperparameter Power Impact in Transformer Language Model Training
Lucas Høyberg Puvis de Chavannes | Mads Guldborg Kjeldgaard Kongsbak | Timmie Rantzau | Leon Derczynski
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing

Training large language models can consume a large amount of energy. We hypothesize that the language model’s configuration impacts its energy consumption, and that there is room for power consumption optimisation in modern large language models. To investigate these claims, we introduce a power consumption factor to the objective function, and explore the range of models and hyperparameter configurations that affect power. We identify multiple configuration factors that can reduce power consumption during language model training while retaining model quality.

pdf bib
An IDR Framework of Opportunities and Barriers between HCI and NLP
Nanna Inie | Leon Derczynski
Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing

This paper presents a framework of opportunities and barriers/risks between the two research fields Natural Language Processing (NLP) and Human-Computer Interaction (HCI). The framework is constructed by following an interdisciplinary research-model (IDR), combining field-specific knowledge with existing work in the two fields. The resulting framework is intended as a departure point for discussion and inspiration for research collaborations.

pdf bib
Abusive Language Recognition in Russian
Kamil Saitov | Leon Derczynski
Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing

Abusive phenomena are commonplace in language on the web. The scope of recognizing abusive language is broad, covering many behaviors and forms of expression. This work addresses automatic detection of abusive language in Russian. The lexical, grammatical and morphological diversity of Russian language present potential difficulties for this task, which is addressed using a variety of machine learning approaches. Finally, competitive performance is reached over multiple domains for this investigation into automatic detection of abusive language in Russian.

2020

pdf bib
Detection and Resolution of Rumors and Misinformation with NLP
Leon Derczynski | Arkaitz Zubiaga
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts

Detecting and grounding false and misleading claims on the web has grown to form a substantial sub-field of NLP. The sub-field addresses problems at multiple different levels of misinformation detection: identifying check-worthy claims; tracking claims and rumors; rumor collection and annotation; grounding claims against knowledge bases; using stance to verify claims; and applying style analysis to detect deception. This half-day tutorial presents the theory behind each of these steps as well as the state-of-the-art solutions.

pdf bib
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
Marcos Zampieri | Preslav Nakov | Sara Rosenthal | Pepa Atanasova | Georgi Karadzhov | Hamdy Mubarak | Leon Derczynski | Zeses Pitenis | Çağrı Çöltekin
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We present the results and the main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval-2020). The task included three subtasks corresponding to the hierarchical taxonomy of the OLID schema from OffensEval-2019, and it was offered in five languages: Arabic, Danish, English, Greek, and Turkish. OffensEval-2020 was one of the most popular tasks at SemEval-2020, attracting a large number of participants across all subtasks and languages: a total of 528 teams signed up to participate in the task, 145 teams submitted official runs on the test data, and 70 teams submitted system description papers.

pdf bib
Accelerated High-Quality Mutual-Information Based Word Clustering
Manuel R. Ciosici | Ira Assent | Leon Derczynski
Proceedings of the Twelfth Language Resources and Evaluation Conference

Word clustering groups words that exhibit similar properties. One popular method for this is Brown clustering, which uses short-range distributional information to construct clusters. Specifically, this is a hard hierarchical clustering with a fixed-width beam that employs bi-grams and greedily minimizes global mutual information loss. The result is word clusters that tend to outperform or complement other word representations, especially when constrained by small datasets. However, Brown clustering has high computational complexity and does not lend itself to parallel computation. This, together with the lack of efficient implementations, limits their applicability in NLP. We present efficient implementations of Brown clustering and the alternative Exchange clustering as well as a number of methods to accelerate the computation of both hierarchical and flat clusters. We show empirically that clusters obtained with the accelerated method match the performance of clusters computed using the original methods.

pdf bib
Offensive Language and Hate Speech Detection for Danish
Gudbjartur Ingi Sigurbergsson | Leon Derczynski
Proceedings of the Twelfth Language Resources and Evaluation Conference

The presence of offensive language on social media platforms and the implications this poses is becoming a major concern in modern society. Given the enormous amount of content created every day, automatic methods are required to detect and deal with this type of content. Until now, most of the research has focused on solving the problem for the English language, while the problem is multilingual. We construct a Danish dataset DKhate containing user-generated comments from various social media platforms, and to our knowledge, the first of its kind, annotated for various types and target of offensive language. We develop four automatic classification systems, each designed to work for both the English and the Danish language. In the detection of offensive language in English, the best performing system achieves a macro averaged F1-score of 0.74, and the best performing system for Danish achieves a macro averaged F1-score of 0.70. In the detection of whether or not an offensive post is targeted, the best performing system for English achieves a macro averaged F1-score of 0.62, while the best performing system for Danish achieves a macro averaged F1-score of 0.73. Finally, in the detection of the target type in a targeted offensive post, the best performing system for English achieves a macro averaged F1-score of 0.56, and the best performing system for Danish achieves a macro averaged F1-score of 0.63. Our work for both the English and the Danish language captures the type and targets of offensive language, and present automatic methods for detecting different kinds of offensive language such as hate speech and cyberbullying.

pdf bib
Maintaining Quality in FEVER Annotation
Leon Derczynski | Julie Binau | Henri Schulte
Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER)

We propose two measures for measuring the quality of constructed claims in the FEVER task. Annotating data for this task involves the creation of supporting and refuting claims over a set of evidence. Automatic annotation processes often leave superficial patterns in data, which learning systems can detect instead of performing the underlying task. Humans also can leave these superficial patterns, either voluntarily or involuntarily (due to e.g. fatigue). The two measures introduced attempt to detect the impact of these superficial patterns. One is a new information-theoretic and distributionality based measure, DCI; and the other an extension of neural probing work over the ARCT task, utility. We demonstrate these measures over a recent major dataset, that from the English FEVER task in 2019.

2019

pdf bib
SemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours
Genevieve Gorrell | Elena Kochkina | Maria Liakata | Ahmet Aker | Arkaitz Zubiaga | Kalina Bontcheva | Leon Derczynski
Proceedings of the 13th International Workshop on Semantic Evaluation

Since the first RumourEval shared task in 2017, interest in automated claim validation has greatly increased, as the danger of “fake news” has become a mainstream concern. However automated support for rumour verification remains in its infancy. It is therefore important that a shared task in this area continues to provide a focus for effort, which is likely to increase. Rumour verification is characterised by the need to consider evolving conversations and news updates to reach a verdict on a rumour’s veracity. As in RumourEval 2017 we provided a dataset of dubious posts and ensuing conversations in social media, annotated both for stance and veracity. The social media rumours stem from a variety of breaking news stories and the dataset is expanded to include Reddit as well as new Twitter posts. There were two concrete tasks; rumour stance prediction and rumour verification, which we present in detail along with results achieved by participants. We received 22 system submissions (a 70% increase from RumourEval 2017) many of which used state-of-the-art methodology to tackle the challenges involved.

pdf bib
Quantifying the morphosyntactic content of Brown Clusters
Manuel R. Ciosici | Leon Derczynski | Ira Assent
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Brown and Exchange word clusters have long been successfully used as word representations in Natural Language Processing (NLP) systems. Their success has been attributed to their seeming ability to represent both semantic and syntactic information. Using corpora representing several language families, we test the hypothesis that Brown and Exchange word clusters are highly effective at encoding morphosyntactic information. Our experiments show that word clusters are highly capable at distinguishing Parts of Speech. We show that increases in Average Mutual Information, the clustering algorithms’ optimization goal, are highly correlated with improvements in encoding of morphosyntactic information. Our results provide empirical evidence that downstream NLP systems addressing tasks dependent on morphosyntactic information can benefit from word cluster features.

pdf bib
Political Stance in Danish
Rasmus Lehmann | Leon Derczynski
Proceedings of the 22nd Nordic Conference on Computational Linguistics

The task of stance detection consists of classifying the opinion within a text towards some target. This paper seeks to generate a dataset of quotes from Danish politicians, label this dataset to allow the task of stance detection to be performed, and present annotation guidelines to allow further expansion of the generated dataset. Furthermore, three models based on an LSTM architecture are designed, implemented and optimized to perform the task of stance detection for the generated dataset. Experiments are performed using conditionality and bi-directionality for these models, and using either singular word embeddings or averaged word embeddings for an entire quote, to determine the optimal model design. The simplest model design, applying neither conditionality or bi-directionality, and averaged word embeddings across quotes, yields the strongest results. Furthermore, it was found that inclusion of the quotes politician, and the party affiliation of the quoted politician, greatly improved performance of the strongest model.

pdf bib
Joint Rumour Stance and Veracity Prediction
Anders Edelbo Lillie | Emil Refsgaard Middelboe | Leon Derczynski
Proceedings of the 22nd Nordic Conference on Computational Linguistics

The net is rife with rumours that spread through microblogs and social media. Not all the claims in these can be verified. However, recent work has shown that the stances alone that commenters take toward claims can be sufficiently good indicators of claim veracity, using e.g. an HMM that takes conversational stance sequences as the only input. Existing results are monolingual (English) and mono-platform (Twitter). This paper introduces a stance-annotated Reddit dataset for the Danish language, and describes various implementations of stance classification models. Of these, a Linear SVM provides predicts stance best, with 0.76 accuracy / 0.42 macro F1. Stance labels are then used to predict veracity across platforms and also across languages, training on conversations held in one language and using the model on conversations held in another. In our experiments, monolinugal scores reach stance-based veracity accuracy of 0.83 (F1 0.68); applying the model across languages predicts veracity of claims with an accuracy of 0.82 (F1 0.67). This demonstrates the surprising and powerful viability of transferring stance-based veracity prediction across languages.

pdf bib
Bornholmsk Natural Language Processing: Resources and Tools
Leon Derczynski | Alex Speed Kjeldsen
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This paper introduces language processing resources and tools for Bornholmsk, a language spoken on the island of Bornholm, with roots in Danish and closely related to Scanian. This presents an overview of the language and available data, and the first NLP models for this living, minority Nordic language. Sammenfattnijng på borrijnholmst: Dæjnna artikkelijn introduserer natursprågsresurser å varktoi for borrijnholmst, ed språg a dær snakkes på ön Borrijnholm me rødder i danst å i nær familia me skånst. Artikkelijn gjer ed âuersyn âuer språged å di datan som fijnnes, å di fosste NLP modællarna for dætta læwenes nordiska minnretâlsspråged.

pdf bib
The Lacunae of Danish Natural Language Processing
Andreas Kirkedal | Barbara Plank | Leon Derczynski | Natalie Schluter
Proceedings of the 22nd Nordic Conference on Computational Linguistics

Danish is a North Germanic language spoken principally in Denmark, a country with a long tradition of technological and scientific innovation. However, the language has received relatively little attention from a technological perspective. In this paper, we review Natural Language Processing (NLP) research, digital resources and tools which have been developed for Danish. We find that availability of models and tools is limited, which calls for work that lifts Danish NLP a step closer to the privileged languages. Dansk abstrakt: Dansk er et nordgermansk sprog, talt primært i kongeriget Danmark, et land med stærk tradition for teknologisk og videnskabelig innovation. Det danske sprog har imidlertid været genstand for relativt begrænset opmærksomhed, teknologisk set. I denne artikel gennemgår vi sprogteknologi-forskning, -ressourcer og -værktøjer udviklet for dansk. Vi konkluderer at der eksisterer et fåtal af modeller og værktøjer, hvilket indbyder til forskning som løfter dansk sprogteknologi i niveau med mere priviligerede sprog.

pdf bib
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing
Joakim Nivre | Leon Derczynski | Filip Ginter | Bjørn Lindi | Stephan Oepen | Anders Søgaard | Jörg Tidemann
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

2018

pdf bib
Proceedings of the 27th International Conference on Computational Linguistics
Emily M. Bender | Leon Derczynski | Pierre Isabelle
Proceedings of the 27th International Conference on Computational Linguistics

pdf bib
IUCM at SemEval-2018 Task 11: Similar-Topic Texts as a Comprehension Knowledge Source
Sofia Reznikova | Leon Derczynski
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the IUCM entry at SemEval-2018 Task 11, on machine comprehension using commonsense knowledge. First, clustering and topic modeling are used to divide given texts into topics. Then, during the answering phase, other texts of the same topic are retrieved and used as commonsense knowledge. Finally, the answer is selected. While clustering itself shows good results, finding an answer proves to be more challenging. This paper reports the results of system evaluation and suggests potential improvements.

2017

pdf bib
SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours
Leon Derczynski | Kalina Bontcheva | Maria Liakata | Rob Procter | Geraldine Wong Sak Hoi | Arkaitz Zubiaga
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

Media is full of false claims. Even Oxford Dictionaries named “post-truth” as the word of 2016. This makes it more important than ever to build systems that can identify the veracity of a story, and the nature of the discourse around it. RumourEval is a SemEval shared task that aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset covering multiple topics – each having their own families of claims and replies – and use these to pose two concrete challenges as well as the results achieved by participants on these challenges.

pdf bib
Simple Open Stance Classification for Rumour Analysis
Ahmet Aker | Leon Derczynski | Kalina Bontcheva
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Stance classification determines the attitude, or stance, in a (typically short) text. The task has powerful applications, such as the detection of fake news or the automatic extraction of attitudes toward entities or events in the media. This paper describes a surprisingly simple and efficient classification approach to open stance classification in Twitter, for rumour and veracity classification. The approach profits from a novel set of automatically identifiable problem-specific features, which significantly boost classifier accuracy and achieve above state-of-the-art results on recent benchmark datasets. This calls into question the value of using complex sophisticated models for stance classification without first doing informed feature extraction.

pdf bib
Proceedings of the 3rd Workshop on Noisy User-generated Text
Leon Derczynski | Wei Xu | Alan Ritter | Tim Baldwin
Proceedings of the 3rd Workshop on Noisy User-generated Text

pdf bib
Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition
Leon Derczynski | Eric Nichols | Marieke van Erp | Nut Limsopatham
Proceedings of the 3rd Workshop on Noisy User-generated Text

This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarization), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet “so.. kktny in 30 mins?!” – even human experts find the entity ‘kktny’ hard to detect and resolve. The goal of this task is to provide a definition of emerging and of rare entities, and based on that, also datasets for detecting these entities. The task as described in this paper evaluated the ability of participating entries to detect and classify novel and emerging named entities in noisy text.

2016

pdf bib
SemEval-2016 Task 12: Clinical TempEval
Steven Bethard | Guergana Savova | Wei-Te Chen | Leon Derczynski | James Pustejovsky | Marc Verhagen
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Bo Han | Alan Ritter | Leon Derczynski | Wei Xu | Tim Baldwin
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

pdf bib
Twitter Geolocation Prediction Shared Task of the 2016 Workshop on Noisy User-generated Text
Bo Han | Afshin Rahimi | Leon Derczynski | Timothy Baldwin
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

This paper presents the shared task for English Twitter geolocation prediction in WNUT 2016. We discuss details of task settings, data preparations and participant systems. The derived dataset and performance figures from each system provide baselines for future research in this realm.

pdf bib
Complementarity, F-score, and NLP Evaluation
Leon Derczynski
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper addresses the problem of quantifying the differences between entity extraction systems, where in general only a small proportion a document should be selected. Comparing overall accuracy is not very useful in these cases, as small differences in accuracy may correspond to huge differences in selections over the target minority class. Conventionally, one may use per-token complementarity to describe these differences, but it is not very useful when the set is heavily skewed. In such situations, which are common in information retrieval and entity recognition, metrics like precision and recall are typically used to describe performance. However, precision and recall fail to describe the differences between sets of objects selected by different decision strategies, instead just describing the proportional amount of correct and incorrect objects selected. This paper presents a method for measuring complementarity for precision, recall and F-score, quantifying the difference between entity extraction approaches.

pdf bib
GATE-Time: Extraction of Temporal Expressions and Events
Leon Derczynski | Jannik Strötgen | Diana Maynard | Mark A. Greenwood | Manuel Jung
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

GATE is a widely used open-source solution for text processing with a large user community. It contains components for several natural language processing tasks. However, temporal information extraction functionality within GATE has been rather limited so far, despite being a prerequisite for many application scenarios in the areas of natural language processing and information retrieval. This paper presents an integrated approach to temporal information processing. We take state-of-the-art tools in temporal expression and event recognition and bring them together to form an openly-available resource within the GATE infrastructure. GATE-Time provides annotation in the form of TimeML events and temporal expressions complying with this mature ISO standard for temporal semantic annotation of documents. Major advantages of GATE-Time are (i) that it relies on HeidelTime for temporal tagging, so that temporal expressions can be extracted and normalized in multiple languages and across different domains, (ii) it includes a modern, fast event recognition and classification tool, and (iii) that it can be combined with different linguistic pre-processing annotations, and is thus not bound to license restricted preprocessing components.

pdf bib
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Leon Derczynski | Kalina Bontcheva | Ian Roberts
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

One of the main obstacles, hampering method development and comparative evaluation of named entity recognition in social media, is the lack of a sizeable, diverse, high quality annotated corpus, analogous to the CoNLL’2003 news dataset. For instance, the biggest Ritter tweet corpus is only 45,000 tokens – a mere 15% the size of CoNLL’2003. Another major shortcoming is the lack of temporal, geographic, and author diversity. This paper introduces the Broad Twitter Corpus (BTC), which is not only significantly bigger, but sampled across different regions, temporal periods, and types of Twitter users. The gold-standard named entity annotations are made by a combination of NLP experts and crowd workers, which enables us to harness crowd recall while maintaining high quality. We also measure the entity drift observed in our dataset (i.e. how entity representation varies over time), and compare to newswire. The corpus is released openly, including source text and intermediate annotations.

pdf bib
Representation and Learning of Temporal Relations
Leon Derczynski
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Determining the relative order of events and times described in text is an important problem in natural language processing. It is also a difficult one: general state-of-the-art performance has been stuck at a relatively low ceiling for years. We investigate the representation of temporal relations, and empirically evaluate the effect that various temporal relation representations have on machine learning performance. While machine learning performance decreases with increased representational expressiveness, not all representation simplifications have equal impact.

2015

pdf bib
Swiss-Chocolate: Combining Flipout Regularization and Random Forests with Artificially Built Subsystems to Boost Text-Classification for Sentiment
Fatih Uzdilli | Martin Jaggi | Dominic Egger | Pascal Julmy | Leon Derczynski | Mark Cieliebak
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
SemEval-2015 Task 6: Clinical TempEval
Steven Bethard | Leon Derczynski | Guergana Savova | James Pustejovsky | Marc Verhagen
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
UFPRSheffield: Contrasting Rule-based and Support Vector Machine Approaches to Time Expression Identification in Clinical TempEval
Hegler Tissot | Genevieve Gorrell | Angus Roberts | Leon Derczynski | Marcos Didonet Del Fabro
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Tune Your Brown Clustering, Please
Leon Derczynski | Sean Chester | Kenneth Bøgh
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Temporal Relation Classification using a Model of Tense and Aspect
Leon Derczynski | Robert Gaizauskas
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Efficient Named Entity Annotation through Pre-empting
Leon Derczynski | Kalina Bontcheva
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Analysis of Temporal Expressions Annotated in Clinical Notes
Hegler Tissot | Angus Roberts | Leon Derczynski | Genevieve Gorrell | Marcus Didonet Del Fabro
Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11)

pdf bib
USFD: Twitter NER with Drift Compensation and Linked Data
Leon Derczynski | Isabelle Augenstein | Kalina Bontcheva
Proceedings of the Workshop on Noisy User-generated Text

pdf bib
Handling and Mining Linguistic Variation in UGC
Leon Derczynski
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

2014

pdf bib
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Marta Sabou | Kalina Bontcheva | Leon Derczynski | Arno Scharl
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Crowdsourcing is an emerging collaborative approach that can be used for the acquisition of annotated corpora and a wide range of other linguistic resources. Although the use of this approach is intensifying in all its key genres (paid-for crowdsourcing, games with a purpose, volunteering-based approaches), the community still lacks a set of best-practice guidelines similar to the annotation best practices for traditional, expert-based corpus acquisition. In this paper we focus on the use of crowdsourcing methods for corpus acquisition and propose a set of best practice guidelines based in our own experiences in this area and an overview of related literature. We also introduce GATE Crowd, a plugin of the GATE platform that relies on these guidelines and offers tool support for using crowdsourcing in a more principled and efficient manner.

pdf bib
DKIE: Open Source Information Extraction for Danish
Leon Derczynski | Camilla Vilhelmsen Field | Kenneth S. Bøgh
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy
Kalina Bontcheva | Ian Roberts | Leon Derczynski | Samantha Alexander-Eames
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Recognising Person Entities in Tweets
Leon Derczynski | Kalina Bontcheva
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

2013

pdf bib
Temporal Signals Help Label Temporal Relations
Leon Derczynski | Robert Gaizauskas
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Empirical Validation of Reichenbach’s Tense Framework
Leon Derczynski | Robert Gaizauskas
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

pdf bib
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
Kalina Bontcheva | Leon Derczynski | Adam Funk | Mark Greenwood | Diana Maynard | Niraj Aswani
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Recognising and Interpreting Named Temporal Expressions
Matteo Brucato | Leon Derczynski | Hector Llorens | Kalina Bontcheva | Christian S. Jensen
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
Leon Derczynski | Alan Ritter | Sam Clark | Kalina Bontcheva
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations
Naushad UzZaman | Hector Llorens | Leon Derczynski | James Allen | Marc Verhagen | James Pustejovsky
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
TIMEN: An Open Temporal Expression Normalisation Resource
Hector Llorens | Leon Derczynski | Robert Gaizauskas | Estela Saquete
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Temporal expressions are words or phrases that describe a point, duration or recurrence in time. Automatically annotating these expressions is a research goal of increasing interest. Recognising them can be achieved with minimally supervised machine learning, but interpreting them accurately (normalisation) is a complex task requiring human knowledge. In this paper, we present TIMEN, a community-driven tool for temporal expression normalisation. TIMEN is derived from current best approaches and is an independent tool, enabling easy integration in existing systems. We argue that temporal expression normalisation can only be effectively performed with a large knowledge base and set of rules. Our solution is a framework and system with which to capture this knowledge for different languages. Using both existing and newly-annotated data, we present results showing competitive performance and invite the IE community to contribute to a knowledge base in order to solve the temporal expression normalisation problem.

pdf bib
Massively Increasing TIMEX3 Resources: A Transduction Approach
Leon Derczynski | Héctor Llorens | Estela Saquete
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Automatic annotation of temporal expressions is a research challenge of great interest in the field of information extraction. Gold standard temporally-annotated resources are limited in size, which makes research using them difficult. Standards have also evolved over the past decade, so not all temporally annotated data is in the same format. We vastly increase available human-annotated temporal expression resources by converting older format resources to TimeML/TIMEX3. This task is difficult due to differing annotation methods. We present a robust conversion tool and a new, large temporal expression resource. Using this, we evaluate our conversion process by using it as training data for an existing TimeML annotation tool, achieving a 0.87 F1 measure - better than any system in the TempEval-2 timex recognition exercise.

2010

pdf bib
USFD2: Annotating Temporal Expresions and TLINKs for TempEval-2
Leon Derczynski | Robert Gaizauskas
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Analysing Temporally Annotated Corpora with CAVaT
Leon Derczynski | Robert Gaizauskas
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present CAVaT, a tool that performs Corpus Analysis and Validation for TimeML. CAVaT is an open source, modular checking utility for statistical analysis of features specific to temporally-annotated natural language corpora. It provides reporting, highlights salient links between a variety of general and time-specific linguistic features, and also validates a temporal annotation to ensure that it is logically consistent and sufficiently annotated. Uniquely, CAVaT provides analysis specific to TimeML-annotated temporal information. TimeML is a standard for annotating temporal information in natural language text. In this paper, we present the reporting part of CAVaT, and then its error-checking ability, including the workings of several novel TimeML document verification methods. This is followed by the execution of some example tasks using the tool to show relations between times, events, signals and links. We also demonstrate inconsistencies in a TimeML corpus (TimeBank) that have been detected with CAVaT.

2008

pdf bib
A Data Driven Approach to Query Expansion in Question Answering
Leon Derczynski | Jun Wang | Robert Gaizauskas | Mark A. Greenwood
Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering

Search
Co-authors