Don Tuggener


2020

pdf bib
LEDGAR: A Large-Scale Multi-label Corpus for Text Classification of Legal Provisions in Contracts
Don Tuggener | Pius von Däniken | Thomas Peetz | Mark Cieliebak
Proceedings of the 12th Language Resources and Evaluation Conference

We present LEDGAR, a multilabel corpus of legal provisions in contracts. The corpus was crawled and scraped from the public domain (SEC filings) and is, to the best of our knowledge, the first freely available corpus of its kind. Since the corpus was constructed semi-automatically, we apply and discuss various approaches to noise removal. Due to the rather large labelset of over 12’000 labels annotated in almost 100’000 provisions in over 60’000 contracts, we believe the corpus to be of interest for research in the field of Legal NLP, (large-scale or extreme) text classification, as well as for legal studies. We discuss several methods to sample subcopora from the corpus and implement and evaluate different automatic classification approaches. Finally, we perform transfer experiments to evaluate how well the classifiers perform on contracts stemming from outside the corpus.

pdf bib
Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems
Jan Deriu | Don Tuggener | Pius von Däniken | Jon Ander Campos | Alvaro Rodrigo | Thiziri Belkacem | Aitor Soroa | Eneko Agirre | Mark Cieliebak
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The lack of time efficient and reliable evalu-ation methods is hampering the development of conversational dialogue systems (chat bots). Evaluations that require humans to converse with chat bots are time and cost intensive, put high cognitive demands on the human judges, and tend to yield low quality results. In this work, we introduce Spot The Bot, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots. Human judges then only annotate for each entity in a conversation whether they think it is human or not (assuming there are humans participants in these conversations). These annotations then allow us to rank chat bots regarding their ability to mimic conversational behaviour of humans. Since we expect that all bots are eventually recognized as such, we incorporate a metric that measures which chat bot is able to uphold human-like be-havior the longest, i.e.Survival Analysis. This metric has the ability to correlate a bot’s performance to certain of its characteristics (e.g.fluency or sensibleness), yielding interpretable results. The comparably low cost of our frame-work allows for frequent evaluations of chatbots during their evaluation cycle. We empirically validate our claims by applying Spot The Bot to three domains, evaluating several state-of-the-art chat bots, and drawing comparisonsto related work. The framework is released asa ready-to-use tool.

2018

pdf bib
SB-CH: A Swiss German Corpus with Sentiment Annotations
Ralf Grubenmann | Don Tuggener | Pius von Däniken | Jan Deriu | Mark Cieliebak
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Stance Detection in Facebook Posts of a German Right-wing Party
Manfred Klenner | Don Tuggener | Simon Clematide
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

We argue that in order to detect stance, not only the explicit attitudes of the stance holder towards the targets are crucial. It is the whole narrative the writer drafts that counts, including the way he hypostasizes the discourse referents: as benefactors or villains, as victims or beneficiaries. We exemplify the ability of our system to identify targets and detect the writer’s stance towards them on the basis of about 100 000 Facebook posts of a German right-wing party. A reader and writer model on top of our verb-based attitude extraction directly reveal stance conflicts.

pdf bib
A method for in-depth comparative evaluation: How (dis)similar are outputs of pos taggers, dependency parsers and coreference resolvers really?
Don Tuggener
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

This paper proposes a generic method for the comparative evaluation of system outputs. The approach is able to quantify the pairwise differences between two outputs and to unravel in detail what the differences consist of. We apply our approach to three tasks in Computational Linguistics, i.e. POS tagging, dependency parsing, and coreference resolution. We find that system outputs are more distinct than the (often) small differences in evaluation scores seem to suggest.

pdf bib
Machine Translation of Spanish Personal and Possessive Pronouns Using Anaphora Probabilities
Ngoc Quang Luong | Andrei Popescu-Belis | Annette Rios Gonzales | Don Tuggener
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We implement a fully probabilistic model to combine the hypotheses of a Spanish anaphora resolution system with those of a Spanish-English machine translation system. The probabilities over antecedents are converted into probabilities for the features of translated pronouns, and are integrated with phrase-based MT using an additional translation model for pronouns. The system improves the translation of several Spanish personal and possessive pronouns into English, by solving translation divergencies such as ‘ella’ vs. ‘she’/‘it’ or ‘su’ vs. ‘his’/‘her’/‘its’/‘their’. On a test set with 2,286 pronouns, a baseline system correctly translates 1,055 of them, while ours improves this by 41. Moreover, with oracle antecedents, possessives are translated with an accuracy of 83%.

pdf bib
Co-reference Resolution of Elided Subjects and Possessive Pronouns in Spanish-English Statistical Machine Translation
Annette Rios Gonzales | Don Tuggener
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper presents a straightforward method to integrate co-reference information into phrase-based machine translation to address the problems of i) elided subjects and ii) morphological underspecification of pronouns when translating from pro-drop languages. We evaluate the method for the language pair Spanish-English and find that translation quality improves with the addition of co-reference information.

2014

pdf bib
Coreference Resolution Evaluation for Higher Level Applications
Don Tuggener
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

2013

pdf bib
UZH in BioNLP 2013
Gerold Schneider | Simon Clematide | Tilia Ellendorff | Don Tuggener | Fabio Rinaldi | Gintarė Grigonytė
Proceedings of the BioNLP Shared Task 2013 Workshop

2011

pdf bib
An Incremental Model for the Coreference Resolution Task of BioNLP 2011
Don Tuggener | Manfred Klenner | Gerold Schneider | Simon Clematide | Fabio Rinaldi
Proceedings of BioNLP Shared Task 2011 Workshop

pdf bib
An Incremental Model for Coreference Resolution with Restrictive Antecedent Accessibility
Manfred Klenner | Don Tuggener
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
An Incremental Entity-Mention Model for Coreference Resolution with Restrictive Antecedent Accessibility
Manfred Klenner | Don Tuggener
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011