Arno Scharl


2024

pdf bib
An Efficient Workflow Towards Improving Classifiers in Low-ResourceSettings with Synthetic Data
Adrian M.P. Bra ̧soveanu | Albert Weichselbraun | Lyndon J.B. Nixon | Arno Scharl
Proceedings of the 9th edition of the Swiss Text Analytics Conference

2016

pdf bib
A Regional News Corpora for Contextualized Entity Discovery and Linking
Adrian Braşoveanu | Lyndon J.B. Nixon | Albert Weichselbraun | Arno Scharl
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a German corpus for Named Entity Linking (NEL) and Knowledge Base Population (KBP) tasks. We describe the annotation guideline, the annotation process, NIL clustering techniques and conversion to popular NEL formats such as NIF and TAC that have been used to construct this corpus based on news transcripts from the German regional broadcaster RBB (Rundfunk Berlin Brandenburg). Since creating such language resources requires significant effort, the paper also discusses how to derive additional evaluation resources for tasks like named entity contextualization or ontology enrichment by exploiting the links between named entities from the annotated corpus. The paper concludes with an evaluation that shows how several well-known NEL tools perform on the corpus, a discussion of the evaluation results, and with suggestions on how to keep evaluation corpora and datasets up to date.

2014

pdf bib
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Marta Sabou | Kalina Bontcheva | Leon Derczynski | Arno Scharl
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Crowdsourcing is an emerging collaborative approach that can be used for the acquisition of annotated corpora and a wide range of other linguistic resources. Although the use of this approach is intensifying in all its key genres (paid-for crowdsourcing, games with a purpose, volunteering-based approaches), the community still lacks a set of best-practice guidelines similar to the annotation best practices for traditional, expert-based corpus acquisition. In this paper we focus on the use of crowdsourcing methods for corpus acquisition and propose a set of best practice guidelines based in our own experiences in this area and an overview of related literature. We also introduce GATE Crowd, a plugin of the GATE platform that relies on these guidelines and offers tool support for using crowdsourcing in a more principled and efficient manner.

2012

pdf bib
Leveraging the Wisdom of the Crowds for the Acquisition of Multilingual Language Resources
Arno Scharl | Marta Sabou | Stefan Gindl | Walter Rafelsberger | Albert Weichselbraun
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Games with a purpose are an increasingly popular mechanism for leveraging the wisdom of the crowds to address tasks which are trivial for humans but still not solvable by computer algorithms in a satisfying manner. As a novel mechanism for structuring human-computer interactions, a key challenge when creating them is motivating users to participate while generating useful and unbiased results. This paper focuses on important design choices and success factors of effective games with a purpose. Our findings are based on lessons learned while developing and deploying Sentiment Quiz, a crowdsourcing application for creating sentiment lexicons (an essential component of most sentiment detection algorithms). We describe the goals and structure of the game, the underlying application framework, the sentiment lexicons gathered through crowdsourcing, as well as a novel approach to automatically extend the lexicons by means of a bootstrapping process. Such an automated extension further increases the efficiency of the acquisition process by limiting the number of terms that need to be gathered from the game participants.

2006

pdf bib
Web coverage of the 2004 US Presidential election
Arno Scharl | Albert Weichselbraun
Proceedings of the 2nd International Workshop on Web as Corpus