Sandra Kübler

Also published as: Sandra Kubler, Sandra Kuebler

2024

pdf bib abs
SemEval Task 8: A Comparison of Traditional and Neural Models for Detecting Machine Authored Text
Srikar Kashyap Pulipaka | Shrirang Mhalgi | Joseph Larson | Sandra Kübler
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Since Large Language Models have reached a stage where it is becoming more and more difficult to distinguish between human and machine written text, there is an increasing need for automated systems to distinguish between them. As part of SemEval Task 8, Subtask A: Binary Human-Written vs. Machine-Generated Text Classification, we explore a variety of machine learning classifiers, from traditional statistical methods, such as Naïve Bayes and Decision Trees, to fine-tuned transformer models, suchas RoBERTa and ALBERT. Our findings show that using a fine-tuned RoBERTa model with optimizedhyperparameters yields the best accuracy. However, the improvement does not translate to the test set because of the differences in distribution in the development and test sets.

pdf bib abs
Bits and Pieces: Investigating the Effects of Subwords in Multi-task Parsing across Languages and Domains
Daniel Dakota | Sandra Kübler
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Neural parsing is very dependent on the underlying language model. However, very little is known about how choices in the language model affect parsing performance, especially in multi-task learning. We investigate questions on how the choice of subwords affects parsing, how subword sharing is responsible for gains or negative transfer in a multi-task setting where each task is parsing of a specific domain of the same language. More specifically, we investigate these issues across four languages: English, German, Italian, and Turkish. We find a general preference for averaged or last subwords across languages and domains. However, specific POS tags may require different subwords, and the distributional overlap between subwords across domains is perhaps a more influential factor in determining positive or negative transfer than discrepancies in the data sizes.

pdf bib abs
Universal Dependencies for Saraiki
Meesum Alam | Francis Tyers | Emily Hanink | Sandra Kübler
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

We present the first treebank of the Saraiki/Siraiki [ISO 639-3 skr] language, using the Universal Dependency annotation scheme (de Marneffe et al., 2021). The treebank currently comprises 587 annotated sentences and 7597 tokens. We explain the most relevant syntactic and morphological features of Saraiki, along with the decision we have made for a range of language specific constructions, namely compounds, verbal structures including light verb and serial verb constructions, and relative clauses.

pdf bib abs
Domain-Weighted Batch Sampling for Neural Dependency Parsing
Jacob Striebel | Daniel Dakota | Sandra Kübler
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

In neural dependency parsing, as well as in the broader field of NLP, domain adaptation remains a challenging problem. When adapting a parser to a target domain, there is a fundamental tension between the need to make use of out-of-domain data and the need to ensure that syntactic characteristic of the target domain are learned. In this work we explore a way to balance these two competing concerns, namely using domain-weighted batch sampling, which allows us to use all available training data, while controlling the probability of sampling in- and out-of-domain data when constructing training batches. We conduct experiments using ten natural language domains and find that domain-weighted batch sampling yields substantial performance improvements in all ten domains compared to a baseline of conventional randomized batch sampling.

pdf bib abs
Scaling Up Authorship Attribution
Jacob Striebel | Abishek Edikala | Ethan Irby | Alex Rosenfeld | J. Gage | Daniel Dakota | Sandra Kübler
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)

We describe our system for authorship attribution in the IARPA HIATUS program. We describe the model and compute infrastructure developed to satisfy the set of technical constraints imposed by IARPA, including runtime limits as well as other constraints related to the ultimate use case. One use-case constraint concerns the explainability of the features used in the system. For this reason, we integrate features from frame semantic parsing, as they are both interpretable and difficult for adversaries to evade. One trade-off with using such features, however, is that more sophisticated feature representations require more complicated architectures, which limit usefulness in time-sensitive and constrained compute environments. We propose an approach to increase the efficiency of frame semantic parsing through an analysis of parallelization and beam search sizes. Our approach results in a system that is approximately 8.37x faster than the base system with a minimal effect on accuracy.

We often assume that annotation tasks, such as annotating for the presence of conspiracy theories, can be annotated with hard labels, without definitions or guidelines. Our annotation experiments, comparing students and experts, show that there is little agreement on basic annotations even among experts. For this reason, we conclude that we need to accept disagreement as an integral part of such annotations.

2023

pdf bib abs
Was That a Question? Automatic Classification of Discourse Meaning in Spanish
Santiago Arróniz | Sandra Kübler
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

This paper examines the effectiveness of different feature representations of audio data in accurately classifying discourse meaning in Spanish. The task involves determining whether an utterance is a declarative sentence, an interrogative, an imperative, etc. We explore how pitch contour can be represented for a discourse-meaning classification task, employing three different audio features: MFCCs, Mel-scale spectrograms, and chromagrams. We also determine if utilizing means is more effective in representing the speech signal, given the large number of coefficients produced during the feature extraction process. Finally, we evaluate whether these feature representation techniques are sensitive to speaker information. Our results show that a recurrent neural network architecture in conjunction with all three feature sets yields the best results for the task.

We investigate approaches to classifying texts into either conspiracy theory or mainstream using the Language Of Conspiracy (LOCO) corpus. Since conspiracy theories are not monolithic constructs, we need to identify approaches that robustly work in an out-of- domain setting (i.e., across conspiracy topics). We investigate whether optimal in-domain set- tings can be transferred to out-of-domain set- tings, and we investigate different methods for bleaching to steer classifiers away from words typical for an individual conspiracy theory. We find that BART works better than an SVM, that we can successfully classify out-of-domain, but there are no clear trends in how to choose the best source training domains. Addition- ally, bleaching only topic words works better than bleaching all content words or completely delexicalizing texts.

pdf bib
Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)
Daniel Dakota | Kilian Evang | Sandra Kübler | Lori Levin
Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)

pdf bib abs
Towards a Swahili Universal Dependency Treebank: Leveraging the Annotations of the Helsinki Corpus of Swahili
Kenneth Steimel | Sandra Kübler
Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)

Dependency annotation can be a laborious process for under-resourced languages. However, in some cases, other resources are available. We investigate whether we can leverage such resources in the case of Swahili: We use the Helsinki Corpus of Swahili for creating a Universal Depedencies treebank for Swahili. The Helsinki Corpus of Swahili provides word-level annotations for part of speech tags, morphological features, and functional syntactic tags. We train neural taggers for these types of annotations, then use those models to annotate our target corpus, the Swahili portion of the OPUS Global Voices Corpus. Based on those annotations, we then manually create constraint grammar rules to annotate the target corpus for Universal Dependencies. In this paper, we describe the process, discuss the annotation decisions we had to make, and we evaluate the approach.

2022

pdf bib abs
Improving POS Tagging for Arabic Dialects on Out-of-Domain Texts
Noor Abo Mokh | Daniel Dakota | Sandra Kübler
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

We investigate part of speech tagging for four Arabic dialects (Gulf, Levantine, Egyptian, and Maghrebi), in an out-of-domain setting. More specifically, we look at the effectiveness of 1) upsampling the target dialect in the training data of a joint model, 2) increasing the consistency of the annotations, and 3) using word embeddings pre-trained on a large corpus of dialectal Arabic. We increase the accuracy on average by about 20 percentage points.

pdf bib abs
IUCL at WASSA 2022 Shared Task: A Text-only Approach to Empathy and Emotion Detection
Yue Chen | Yingnan Ju | Sandra Kübler
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

Our system, IUCL, participated in the WASSA 2022 Shared Task on Empathy Detection and Emotion Classification. Our main goal in building this system is to investigate how the use of demographic attributes influences performance. Our (official) results show that our text-only systems perform very competitively, ranking first in the empathy detection task, reaching an average Pearson correlation of 0.54, and second in the emotion classification task, reaching a Macro-F of 0.572. Our systems that use both text and demographic data are less competitive.

pdf bib abs
How to Parse a Creole: When Martinican Creole Meets French
Ludovic Mompelat | Daniel Dakota | Sandra Kübler
Proceedings of the 29th International Conference on Computational Linguistics

We investigate methods to develop a parser for Martinican Creole, a highly under-resourced language, using a French treebank. We compare transfer learning and multi-task learning models and examine different input features and strategies to handle the massive size imbalance between the treebanks. Surprisingly, we find that a simple concatenated (French + Martinican Creole) baseline yields optimal results even though it has access to only 80 Martinican Creole sentences. POS embeddings work better than lexical ones, but they suffer from negative transfer.

pdf bib
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022
Sameer Pradhan | Sandra Kuebler
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022

Conspiracy theories have found a new channel on the internet and spread by bringing together like-minded people, thus functioning as an echo chamber. The new 88-million word corpus Language of Conspiracy (LOCO) was created with the intention to provide a text collection to study how the language of conspiracy differs from mainstream language. We use this corpus to develop a robust annotation scheme that will allow us to distinguish between documents containing conspiracy language and documents that do not contain any conspiracy content or that propagate conspiracy theories via misinformation (which we explicitly disregard in our work). We find that focusing on indicators of a belief in a conspiracy combined with textual cues of conspiracy language allows us to reach a substantial agreement (based on Fleiss’ kappa and Krippendorff’s alpha). We also find that the automatic retrieval methods used to collect the corpus work well in finding mainstream documents, but include some documents in the conspiracy category that would not belong there based on our definition.

2021

pdf bib
Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021)
Daniel Dakota | Kilian Evang | Sandra Kübler
Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021)

pdf bib abs
On the Interaction between Annotation Quality and Classifier Performance in Abusive Language Detection
Holly Lopez Long | Alexandra O’Neil | Sandra Kübler
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Abusive language detection has become an important tool for the cultivation of safe online platforms. We investigate the interaction of annotation quality and classifier performance. We use a new, fine-grained annotation scheme that allows us to distinguish between abusive language and colloquial uses of profanity that are not meant to harm. Our results show a tendency of crowd workers to overuse the abusive class, which creates an unrealistic class balance and affects classification accuracy. We also investigate different methods of distinguishing between explicit and implicit abuse and show lexicon-based approaches either over- or under-estimate the proportion of explicit abuse in data sets.

pdf bib abs
Delexicalized Cross-lingual Dependency Parsing for Xibe
He Zhou | Sandra Kübler
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Manually annotating a treebank is time-consuming and labor-intensive. We conduct delexicalized cross-lingual dependency parsing experiments, where we train the parser on one language and test on our target language. As our test case, we use Xibe, a severely under-resourced Tungusic language. We assume that choosing a closely related language as the source language will provide better results than more distant relatives. However, it is not clear how to determine those closely related languages. We investigate three different methods: choosing the typologically closest language, using LangRank, and choosing the most similar language based on perplexity. We train parsing models on the selected languages using UDify and test on different genres of Xibe data. The results show that languages selected based on typology and perplexity scores outperform those predicted by LangRank; Japanese is the optimal source language. In determining the source language, proximity to the target language is more important than large training sizes. Parsing is also influenced by genre differences, but they have little influence as long as the training data is at least as complex as the target.

pdf bib abs
Period Classification in Chinese Historical Texts
Zuoyu Tian | Sandra Kübler
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

In this study, we study language change in Chinese Biji by using a classification task: classifying Ancient Chinese texts by time periods. Specifically, we focus on a unique genre in classical Chinese literature: Biji (literally “notebook” or “brush notes”), i.e., collections of anecdotes, quotations, etc., anything authors consider noteworthy, Biji span hundreds of years across many dynasties and conserve informal language in written form. For these reasons, they are regarded as a good resource for investigating language change in Chinese (Fang, 2010). In this paper, we create a new dataset of 108 Biji across four dynasties. Based on the dataset, we first introduce a time period classification task for Chinese. Then we investigate different feature representation methods for classification. The results show that models using contextualized embeddings perform best. An analysis of the top features chosen by the word n-gram model (after bleaching proper nouns) confirms that these features are informative and correspond to observations and assumptions made by historical linguists.

pdf bib abs
Bidirectional Domain Adaptation Using Weighted Multi-Task Learning
Daniel Dakota | Zeeshan Ali Sayyed | Sandra Kübler
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)

Domain adaption in syntactic parsing is still a significant challenge. We address the issue of data imbalance between the in-domain and out-of-domain treebank typically used for the problem. We define domain adaptation as a Multi-task learning (MTL) problem, which allows us to train two parsers, one for each do-main. Our results show that the MTL approach is beneficial for the smaller treebank. For the larger treebank, we need to use loss weighting in order to avoid a decrease in performance be-low the single task. In order to determine towhat degree the data imbalance between two domains and the domain differences affect results, we also carry out an experiment with two imbalanced in-domain treebanks and show that loss weighting also improves performance in an in-domain setting. Given loss weighting in MTL, we can improve results for both parsers.

pdf bib
What’s in a Span? Evaluating the Creativity of a Span-Based Neural Constituency Parser
Daniel Dakota | Sandra Kübler
Proceedings of the Society for Computation in Linguistics 2021

2020

pdf bib abs
Investigating Sampling Bias in Abusive Language Detection
Dante Razo | Sandra Kübler
Proceedings of the Fourth Workshop on Online Abuse and Harms

Abusive language detection is becoming increasingly important, but we still understand little about the biases in our datasets for abusive language detection, and how these biases affect the quality of abusive language detection. In the work reported here, we reproduce the investigation of Wiegand et al. (2019) to determine differences between different sampling strategies. They compared boosted random sampling, where abusive posts are upsampled, and biased topic sampling, which focuses on topics that are known to cause abusive language. Instead of comparing individual datasets created using these sampling strategies, we use the sampling strategies on a single, large dataset, thus eliminating the textual source of the dataset as a potential confounding factor. We show that differences in the textual source can have more effect than the chosen sampling strategy.

pdf bib
Fine-Grained Morpho-Syntactic Analysis for the Under-Resourced Language Chaghatay
Kenneth Steimel | Akbar Amat | Arienne Dwyer | Sandra Kübler
Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories

pdf bib abs
Offensive Language Detection Using Brown Clustering
Zuoyu Tian | Sandra Kübler
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this study, we investigate the use of Brown clustering for offensive language detection. Brown clustering has been shown to be of little use when the task involves distinguishing word polarity in sentiment analysis tasks. In contrast to previous work, we train Brown clusters separately on positive and negative sentiment data, but then combine the information into a single complex feature per word. This way of representing words results in stable improvements in offensive language detection, when used as the only features or in combination with words or character n-grams. Brown clusters add important information, even when combined with words or character n-grams or with standard word embeddings in a convolutional neural network. However, we also found different trends between the two offensive language data sets we used.

pdf bib abs
Universal Dependency Treebank for Xibe
He Zhou | Juyeon Chung | Sandra Kübler | Francis Tyers
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

We present our work of constructing the first treebank for the Xibe language following the Universal Dependencies (UD) annotation scheme. Xibe is a low-resourced and severely endangered Tungusic language spoken by the Xibe minority living in the Xinjiang Uygur Autonomous Region of China. We collected 810 sentences so far, including 544 sentences from a grammar book on written Xibe and 266 sentences from Cabcal News. We annotated those sentences manually from scratch. In this paper, we report the procedure of building this treebank and analyze several important annotation issues of our treebank. Finally, we propose our plans for future work.

Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g.,SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for most of the world’s languages. In this paper, we present the first large-scale NLI dataset (consisting of ~56,000 annotated sentence pairs) for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). Unlike recent attempts at extending NLI to other languages, our dataset does not rely on any automatic translation or non-expert annotation. Instead, we elicit annotations from native speakers specializing in linguistics. We follow closely the annotation protocol used for MNLI, but create new strategies for eliciting diverse hypotheses. We establish several baseline results on our dataset using state-of-the-art pre-trained models for Chinese, and find even the best performing models to be far outpaced by human performance (~12% absolute performance gap), making it a challenging new resource that we hope will help to accelerate progress in Chinese NLU. To the best of our knowledge, this is the first human-elicited MNLI-style corpus for a non-English language.

2019

pdf bib abs
UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs
Jian Zhu | Zuoyu Tian | Sandra Kübler
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the UM-IU@LING’s system for the SemEval 2019 Task 6: Offens-Eval. We take a mixed approach to identify and categorize hate speech in social media. In subtask A, we fine-tuned a BERT based classifier to detect abusive content in tweets, achieving a macro F1 score of 0.8136 on the test data, thus reaching the 3rd rank out of 103 submissions. In subtasks B and C, we used a linear SVM with selected character n-gram features. For subtask C, our system could identify the target of abuse with a macro F1 score of 0.5243, ranking it 27th out of 65 submissions.

pdf bib abs
Investigating Multilingual Abusive Language Detection: A Cautionary Tale
Kenneth Steimel | Daniel Dakota | Yue Chen | Sandra Kübler
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Abusive language detection has received much attention in the last years, and recent approaches perform the task in a number of different languages. We investigate which factors have an effect on multilingual settings, focusing on the compatibility of data and annotations. In the current paper, we focus on English and German. Our findings show large differences in performance between the two languages. We find that the best performance is achieved by different classification algorithms. Sampling to address class imbalance issues is detrimental for German and beneficial for English. The only similarity that we find is that neither data set shows clear topics when we compare the results of topic modeling to the gold standard. Based on our findings, we can conclude that a multilingual optimization of classifiers is not possible even in settings where comparable data sets are used.

2018

pdf bib
Practical Parsing for Downstream Applications
Daniel Dakota | Sandra Kübler
Proceedings of the 27th International Conference on Computational Linguistics: Tutorial Abstracts

pdf bib abs
Detecting Syntactic Features of Translated Chinese
Hai Hu | Wen Li | Sandra Kübler
Proceedings of the Second Workshop on Stylistic Variation

We present a machine learning approach to distinguish texts translated to Chinese (by humans) from texts originally written in Chinese, with a focus on a wide range of syntactic features. Using Support Vector Machines (SVMs) as classifier on a genre-balanced corpus in translation studies of Chinese, we find that constituent parse trees and dependency triples as features without lexical information perform very well on the task, with an F-measure above 90%, close to the results of lexical n-gram features, without the risk of learning topic information rather than translation features. Thus, we claim syntactic features alone can accurately distinguish translated from original Chinese. Translated Chinese exhibits an increased use of determiners, subject position pronouns, NP + “的” as NP modifiers, multiple NPs or VPs conjoined by "、", among other structures. We also interpret the syntactic features with reference to previous translation studies in Chinese, particularly the usage of pronouns.

pdf bib
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology
Sandra Kuebler | Garrett Nicolai
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

2017

pdf bib abs
Native Language Identification using Phonetic Algorithms
Charese Smiley | Sandra Kübler
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

In this paper, we discuss the results of the IUCL system in the NLI Shared Task 2017. For our system, we explore a variety of phonetic algorithms to generate features for Native Language Identification. These features are contrasted with one of the most successful type of features in NLI, character n-grams. We find that although phonetic features do not perform as well as character n-grams alone, they do increase overall F1 score when used together with character n-grams.

pdf bib abs
Creating POS Tagging and Dependency Parsing Experts via Topic Modeling
Atreyee Mukherjee | Sandra Kübler | Matthias Scheutz
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Part of speech (POS) taggers and dependency parsers tend to work well on homogeneous datasets but their performance suffers on datasets containing data from different genres. In our current work, we investigate how to create POS tagging and dependency parsing experts for heterogeneous data by employing topic modeling. We create topic models (using Latent Dirichlet Allocation) to determine genres from a heterogeneous dataset and then train an expert for each of the genres. Our results show that the topic modeling experts reach substantial improvements when compared to the general versions. For dependency parsing, the improvement reaches 2 percent points over the full training baseline when we use two topics.

pdf bib abs
Towards Replicability in Parsing
Daniel Dakota | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

We investigate parsing replicability across 7 languages (and 8 treebanks), showing that choices concerning the use of grammatical functions in parsing or evaluation, the influence of the rare word threshold, as well as choices in test sentences and evaluation script options have considerable and often unexpected effects on parsing accuracies. All of those choices need to be carefully documented if we want to ensure replicability.

pdf bib abs
Non-Deterministic Segmentation for Chinese Lattice Parsing
Hai Hu | Daniel Dakota | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses. We investigate a pipeline approach to segmentation and parsing using word lattices as parser input. We compare CRF-based and lexicon-based approaches to word segmentation. Our results show that the lattice parser is capable of selecting the correction segmentation from thousands of options, thus drastically reducing the number of unparsed sentence. Lexicon-based parsing models have a better coverage than the CRF-based approach, but the many options are more difficult to handle. We reach our best result by using a lexicon from the n-best CRF analyses, combined with highly probable words.

pdf bib abs
Similarity Based Genre Identification for POS Tagging Experts & Dependency Parsing
Atreyee Mukherjee | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

POS tagging and dependency parsing achieve good results for homogeneous datasets. However, these tasks are much more difficult on heterogeneous datasets. In (Mukherjee et al. 2016, 2017), we address this issue by creating genre experts for both POS tagging and parsing. We use topic modeling to automatically separate training and test data into genres and to create annotation experts per genre by training separate models for each topic. However, this approach assumes that topic modeling is performed jointly on training and test sentences each time a new test sentence is encountered. We extend this work by assigning new test sentences to their genre expert by using similarity metrics. We investigate three different types of methods: 1) based on words highly associated with a genre by the topic modeler, 2) using a k-nearest neighbor classification approach, and 3) using perplexity to determine the closest topic. The results show that the choice of similarity metric has an effect on results and that we can reach comparable accuracies to the joint topic modeling in POS tagging and dependency parsing, thus providing a viable and efficient approach to POS tagging and parsing a sentence by its genre expert.

Parser evaluation traditionally relies on evaluation metrics which deliver a single aggregate score over all sentences in the parser output, such as PARSEVAL. However, for the evaluation of parser performance concerning a particular phenomenon, a test suite of sentences is needed in which this phenomenon has been identified. In recent years, the parsing of discontinuous structures has received a rising interest. Therefore, in this paper, we present a test suite for testing the performance of dependency and constituency parsers on non-projective dependencies and discontinuous constituents for German. The test suite is based on the newly released TIGER treebank version 2.2. It provides a unique possibility of benchmarking parsers on non-local syntactic relationships in German, for constituents and dependencies. We include a linguistic analysis of the phenomena that cause discontinuity in the TIGER annotation, thereby closing gaps in previous literature. The linguistic phenomena we investigate include extraposition, a placeholder/repeated element construction, topicalization, scrambling, local movement, parentheticals, and fronting of pronouns.

pdf bib abs
SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer
Timur Gilmanov | Olga Scrivner | Sandra Kübler
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

It is well known that word aligned parallel corpora are valuable linguistic resources. Since many factors affect automatic alignment quality, manual post-editing may be required in some applications. While there are several state-of-the-art word-aligners, such as GIZA++ and Berkeley, there is no simple visual tool that would enable correcting and editing aligned corpora of different formats. We have developed SWIFT Aligner, a free, portable software that allows for visual representation and editing of aligned corpora from several most commonly used formats: TALP, GIZA, and NAACL. In addition, our tool has incorporated part-of-speech and syntactic dependency transfer from an annotated source language into an unannotated target language, by means of word-alignment.

pdf bib
Feature Selection for Highly Skewed Sentiment Analysis Tasks
Can Liu | Sandra Kübler | Ning Yu
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)

pdf bib
Parsing German: How Much Morphology Do We Need?
Wolfgang Maier | Sandra Kübler | Daniel Dakota | Daniel Whyatt
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages

pdf bib
Introducing the SPMRL 2014 Shared Task on Parsing Morphologically-rich Languages
Djamé Seddah | Sandra Kübler | Reut Tsarfaty
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages

pdf bib
IUCL: Combining Information Sources for SemEval Task 5
Alex Rudnick | Levi King | Can Liu | Markus Dickinson | Sandra Kübler
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
ASMA: A System for Automatic Segmentation and Morpho-Syntactic Disambiguation of Modern Standard Arabic
Muhammad Abdul-Mageed | Mona Diab | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Domain Adaptation for Parsing
Eric Baucom | Levi King | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Towards Domain Adaptation for Parsing Web Data
Mohammad Khan | Markus Dickinson | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Machine Learning for Mention Head Detection in Multilingual Coreference Resolution
Desislava Zhekova | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Does Size Matter? Text and Grammar Revision for Parsing Social Media Data
Mohammad Khan | Markus Dickinson | Sandra Kuebler
Proceedings of the Workshop on Language Analysis in Social Media

pdf bib
Parsing Morphologically Rich Languages: Introduction to the Special Issue
Reut Tsarfaty | Djamé Seddah | Sandra Kübler | Joakim Nivre
Computational Linguistics, Volume 39, Issue 1 - March 2013

2012

pdf bib
Predicting Learner Levels for Online Exercises of Hebrew
Markus Dickinson | Sandra Kübler | Anthony Meyer
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

pdf bib
Annotating Coordination in the Penn Treebank
Wolfgang Maier | Sandra Kübler | Erhard Hinrichs | Julia Krivanek
Proceedings of the Sixth Linguistic Annotation Workshop

pdf bib
SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media
Muhammad Abdul-Mageed | Sandra Kuebler | Mona Diab
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

pdf bib
UBIU for Multilingual Coreference Resolution in OntoNotes
Desislava Zhekova | Sandra Kübler | Joshua Bonner | Marwa Ragheb | Yu-Yin Hsu
Joint Conference on EMNLP and CoNLL - Shared Task

2011

pdf bib
Fast Domain Adaptation for Part of Speech Tagging for Dialogues
Sandra Kübler | Eric Baucom
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Actions Speak Louder than Words: Evaluating Parsers in the Context of Natural Language Understanding Systems for Human-Robot Interaction
Sandra Kübler | Rachael Cantrell | Matthias Scheutz
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Singletons and Coreference Resolution Evaluation
Sandra Kübler | Desislava Zhekova
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Filling the Gap: Semi-Supervised Learning for Opinion Detection Across Domains
Ning Yu | Sandra Kübler
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

pdf bib
UBIU: A Robust System for Resolving Unrestricted Coreference
Desislava Zhekova | Sandra Kübler
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Towards a Malay Derivational Lexicon: Learning Affixes Using Expectation Maximization
Suriani Sulaiman | Michael Gasser | Sandra Kuebler
Proceedings of the 2nd Workshop on South Southeast Asian Natural Language Processing (WSSANLP)

2010

pdf bib
UBIU: A Language-Independent System for Coreference Resolution
Desislava Zhekova | Sandra Kübler
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Proceedings of the ACL 2010 System Demonstrations
Sandra Kübler
Proceedings of the ACL 2010 System Demonstrations

pdf bib
Chunking German: An Unsolved Problem
Sandra Kübler | Kathrin Beck | Erhard Hinrichs | Heike Telljohann
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
Is Arabic Part of Speech Tagging Feasible Without Word Segmentation?
Emad Mohamed | Sandra Kübler
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib abs
Arabic Part of Speech Tagging
Emad Mohamed | Sandra Kübler
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Arabic is a morphologically rich language, which presents a challenge for part of speech tagging. In this paper, we compare two novel methods for POS tagging of Arabic without the use of gold standard word segmentation but with the full POS tagset of the Penn Arabic Treebank. The first approach uses complex tags that describe full words and does not require any word segmentation. The second approach is segmentation-based, using a machine learning segmenter. In this approach, the words are first segmented, then the segments are annotated with POS tags. Because of the word-based approach, we evaluate full word accuracy rather than segment accuracy. Word-based POS tagging yields better results than segment-based tagging (93.93% vs. 93.41%). Word based tagging also gives the best results on known words, the segmentation-based approach gives better results on unknown words. Combining both methods results in a word accuracy of 94.37%, which is very close to the result obtained by using gold standard segmentation (94.91%).

pdf bib abs
The Indiana “Cooperative Remote Search Task” (CReST) Corpus
Kathleen Eberhard | Hannele Nicholson | Sandra Kübler | Susan Gundersen | Matthias Scheutz
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper introduces a novel corpus of natural language dialogues obtained from humans performing a cooperative, remote, search task (CReST) as it occurs naturally in a variety of scenarios (e.g., search and rescue missions in disaster areas). This corpus is unique in that it involves remote collaborations between two interlocutors who each have to perform tasks that require the other's assistance. In addition, one interlocutor's tasks require physical movement through an indoor environment as well as interactions with physical objects within the environment. The multi-modal corpus contains the speech signals as well as transcriptions of the dialogues, which are additionally annotated for dialog structure, disfluencies, and for constituent and dependency syntax. On the dialogue level, the corpus was annotated for separate dialogue moves, based on the classification developed by Carletta et al. (1997) for coding task-oriented dialogues. Disfluencies were annotated using the scheme developed by Lickley (1998). The syntactic annotation comprises POS annotation, Penn Treebank style constituent annotations as well as dependency annotations based on the dependencies of pennconverter.

2009

pdf bib
Parsing Coordinations
Sandra Kübler | Erhard Hinrichs | Wolfgang Maier | Eva Klett
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Semi-Supervised Learning for Word Sense Disambiguation: Quality vs. Quantity
Sandra Kübler | Desislava Zhekova
Proceedings of the International Conference RANLP-2009

pdf bib
Diacritization for Real-World Arabic Texts
Emad Mohamed | Sandra Kübler
Proceedings of the International Conference RANLP-2009

pdf bib
Instance Sampling Methods for Pronoun Resolution
Holger Wunsch | Sandra Kübler | Rachael Cantrell
Proceedings of the International Conference RANLP-2009

2008

pdf bib
Proceedings of the Workshop on Parsing German
Sandra Kübler | Gerald Penn
Proceedings of the Workshop on Parsing German

pdf bib
The PaGe 2008 Shared Task on Parsing German
Sandra Kübler
Proceedings of the Workshop on Parsing German

pdf bib abs
How to Compare Treebanks
Sandra Kübler | Wolfgang Maier | Ines Rehbein | Yannick Versley
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EvalB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.

pdf bib abs
POS Tagging for German: how important is the Right Context?
Steliana Ivanova | Sandra Kuebler
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, many languages require the inclusion of right context for correct disambiguation. We show for German that the best results are reached by a combination of left and right context. If only left context is available, then changing the direction of analysis and going from right to left improves the results. In a version of MBT with default parameter settings, the inclusion of the right context improved POS tagging accuracy from 94.00% to 96.08%, thus corroborating our hypothesis. The version with optimized parameters reaches 96.73%.