Jan-Christoph Klie


2022

pdf bib
Annotation Curricula to Implicitly Train Non-Expert Annotators
Ji-Ung Lee | Jan-Christoph Klie | Iryna Gurevych
Computational Linguistics, Volume 48, Issue 2 - June 2022

Annotation studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain. This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations; especially in citizen science or crowdsourcing scenarios where domain expertise is not required. To alleviate these issues, this work proposes annotation curricula, a novel approach to implicitly train annotators. The goal is to gradually introduce annotators into the task by ordering instances to be annotated according to a learning curriculum. To do so, this work formalizes annotation curricula for sentence- and paragraph-level annotation tasks, defines an ordering strategy, and identifies well-performing heuristics and interactively trained models on three existing English datasets. Finally, we provide a proof of concept for annotation curricula in a carefully designed user study with 40 voluntary participants who are asked to identify the most fitting misconception for English tweets about the Covid-19 pandemic. The results indicate that using a simple heuristic to order instances can already significantly reduce the total annotation time while preserving a high annotation quality. Annotation curricula thus can be a promising research direction to improve data collection. To facilitate future research—for instance, to adapt annotation curricula to specific tasks and expert annotation scenarios—all code and data from the user study consisting of 2,400 annotations is made available.1

2021

pdf bib
Human-In-The-LoopEntity Linking for Low Resource Domains
Jan-Christoph Klie | Richard Eckart de Castilho | Iryna Gurevych
Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances

Entity linking (EL) is concerned with disambiguating entity mentions in a text against knowledge bases (KB). To quickly annotate texts with EL even in low-resource domains and noisy text, we present a novel Human-In-The-Loop EL approach. We show that it greatly outperforms a strong baseline in simulation. In a user study, annotation time is reduced by 35 % compared to annotating without interactive support; users report that they strongly prefer our system over ones without. An open-source and ready-to-use implementation based on the text annotation platform is made available.

2020

pdf bib
From Zero to Hero: Human-In-The-Loop Entity Linking in Low Resource Domains
Jan-Christoph Klie | Richard Eckart de Castilho | Iryna Gurevych
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Entity linking (EL) is concerned with disambiguating entity mentions in a text against knowledge bases (KB). It is crucial in a considerable number of fields like humanities, technical writing and biomedical sciences to enrich texts with semantics and discover more knowledge. The use of EL in such domains requires handling noisy texts, low resource settings and domain-specific KBs. Existing approaches are mostly inappropriate for this, as they depend on training data. However, in the above scenario, there exists hardly annotated data, and it needs to be created from scratch. We therefore present a novel domain-agnostic Human-In-The-Loop annotation approach: we use recommenders that suggest potential concepts and adaptive candidate ranking, thereby speeding up the overall annotation process and making it less tedious for users. We evaluate our ranking approach in a simulation on difficult texts and show that it greatly outperforms a strong baseline in ranking accuracy. In a user study, the annotation speed improves by 35% compared to annotating without interactive support; users report that they strongly prefer our system. An open-source and ready-to-use implementation based on the text annotation platform INCEpTION (https://inception-project.github.io) is made available.

2019

pdf bib
A Multi-Platform Annotation Ecosystem for Domain Adaptation
Richard Eckart de Castilho | Nancy Ide | Jin-Dong Kim | Jan-Christoph Klie | Keith Suderman
Proceedings of the 13th Linguistic Annotation Workshop

This paper describes an ecosystem consisting of three independent text annotation platforms. To demonstrate their ability to work in concert, we illustrate how to use them to address an interactive domain adaptation task in biomedical entity recognition. The platforms and the approach are in general domain-independent and can be readily applied to other areas of science.

2018

pdf bib
Multimodal Frame Identification with Multilingual Evaluation
Teresa Botschen | Iryna Gurevych | Jan-Christoph Klie | Hatem Mousselly-Sergieh | Stefan Roth
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

An essential step in FrameNet Semantic Role Labeling is the Frame Identification (FrameId) task, which aims at disambiguating a situation around a predicate. Whilst current FrameId methods rely on textual representations only, we hypothesize that FrameId can profit from a richer understanding of the situational context. Such contextual information can be obtained from common sense knowledge, which is more present in images than in text. In this paper, we extend a state-of-the-art FrameId system in order to effectively leverage multimodal representations. We conduct a comprehensive evaluation on the English FrameNet and its German counterpart SALSA. Our analysis shows that for the German data, textual representations are still competitive with multimodal ones. However on the English data, our multimodal FrameId approach outperforms its unimodal counterpart, setting a new state of the art. Its benefits are particularly apparent in dealing with ambiguous and rare instances, the main source of errors of current systems. For research purposes, we release (a) the implementation of our system, (b) our evaluation splits for SALSA 2.0, and (c) the embeddings for synsets and IMAGINED words.

pdf bib
Integrating Knowledge-Supported Search into the INCEpTION Annotation Platform
Beto Boullosa | Richard Eckart de Castilho | Naveen Kumar | Jan-Christoph Klie | Iryna Gurevych
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Annotating entity mentions and linking them to a knowledge resource are essential tasks in many domains. It disambiguates mentions, introduces cross-document coreferences, and the resources contribute extra information, e.g. taxonomic relations. Such tasks benefit from text annotation tools that integrate a search which covers the text, the annotations, as well as the knowledge resource. However, to the best of our knowledge, no current tools integrate knowledge-supported search as well as entity linking support. We address this gap by introducing knowledge-supported search functionality into the INCEpTION text annotation platform. In our approach, cross-document references are created by linking entity mentions to a knowledge base in the form of a structured hierarchical vocabulary. The resulting annotations are then indexed to enable fast and yet complex queries taking into account the text, the annotations, and the vocabulary structure.

pdf bib
The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation
Jan-Christoph Klie | Michael Bugert | Beto Boullosa | Richard Eckart de Castilho | Iryna Gurevych
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We introduce INCEpTION, a new annotation platform for tasks including interactive and semantic annotation (e.g., concept linking, fact linking, knowledge base population, semantic frame annotation). These tasks are very time consuming and demanding for annotators, especially when knowledge bases are used. We address these issues by developing an annotation platform that incorporates machine learning capabilities which actively assist and guide annotators. The platform is both generic and modular. It targets a range of research domains in need of semantic annotation, such as digital humanities, bioinformatics, or linguistics. INCEpTION is publicly available as open-source software.