Anders Søgaard

Also published as: Anders Sogaard


2021

pdf bib
Itihasa: A large-scale corpus for Sanskrit to English translation
Rahul Aralikatte | Miryam de Lhoneux | Anoop Kunchukuttan | Anders Søgaard
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

This work introduces Itihasa, a large-scale translation dataset containing 93,000 pairs of Sanskrit shlokas and their English translations. The shlokas are extracted from two Indian epics viz., The Ramayana and The Mahabharata. We first describe the motivation behind the curation of such a dataset and follow up with empirical analysis to bring out its nuances. We then benchmark the performance of standard translation models on this corpus and show that even state-of-the-art transformer architectures perform poorly, emphasizing the complexity of the dataset.

pdf bib
How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task
Rahul Aralikatte | Héctor Ricardo Murrieta Bello | Miryam de Lhoneux | Daniel Hershcovich | Marcel Bollmann | Anders Søgaard
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

This work shows that competitive translation results can be obtained in a constrained setting by incorporating the latest advances in memory and compute optimization. We train and evaluate large multilingual translation models using a single GPU for a maximum of 100 hours and get within 4-5 BLEU points of the top submission on the leaderboard. We also benchmark standard baselines on the PMI corpus and re-discover well-known shortcomings of translation systems and metrics.

pdf bib
Multilingual Negation Scope Resolution for Clinical Text
Mareike Hartmann | Anders Søgaard
Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis

Negation scope resolution is key to high-quality information extraction from clinical texts, but so far, efforts to make encoders used for information extraction negation-aware have been limited to English. We present a universal approach to multilingual negation scope resolution, that overcomes the lack of training data by relying on disparate resources in different languages and domains. We evaluate two approaches to learn from these resources, training on combined data and training in a multi-task learning setup. Our experiments show that zero-shot scope resolution in clinical text is possible, and that combining available resources improves performance in most cases.

pdf bib
Ellipsis Resolution as Question Answering: An Evaluation
Rahul Aralikatte | Matthew Lamm | Daniel Hardt | Anders Søgaard
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Most, if not all forms of ellipsis (e.g., so does Mary) are similar to reading comprehension questions (what does Mary do), in that in order to resolve them, we need to identify an appropriate text span in the preceding discourse. Following this observation, we present an alternative approach for English ellipsis resolution relying on architectures developed for question answering (QA). We present both single-task models, and joint models trained on auxiliary QA and coreference resolution datasets, clearly outperforming the current state of the art for Sluice Ellipsis (from 70.00 to 86.01 F1) and Verb Phrase Ellipsis (from 72.89 to 78.66 F1).

pdf bib
We Need To Talk About Random Splits
Anders Søgaard | Sebastian Ebert | Jasmijn Bastings | Katja Filippova
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

(CITATION) argued for using random splits rather than standard splits in NLP experiments. We argue that random splits, like standard splits, lead to overly optimistic performance estimates. We can also split data in biased or adversarial ways, e.g., training on short sentences and evaluating on long ones. Biased sampling has been used in domain adaptation to simulate real-world drift; this is known as the covariate shift assumption. In NLP, however, even worst-case splits, maximizing bias, often under-estimate the error observed on new samples of in-domain data, i.e., the data that models should minimally generalize to at test time. This invalidates the covariate shift assumption. Instead of using multiple random splits, future benchmarks should ideally include multiple, independent test sets instead; if infeasible, we argue that multiple biased splits leads to more realistic performance estimates than multiple random splits.

pdf bib
Error Analysis and the Role of Morphology
Marcel Bollmann | Anders Søgaard
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We evaluate two common conjectures in error analysis of NLP models: (i) Morphology is predictive of errors; and (ii) the importance of morphology increases with the morphological complexity of a language. We show across four different tasks and up to 57 languages that of these conjectures, somewhat surprisingly, only (i) is true. Using morphological features does improve error prediction across tasks; however, this effect is less pronounced with morphologically complex languages. We speculate this is because morphology is more discriminative in morphologically simple languages. Across all four tasks, case and gender are the morphological features most predictive of error.

pdf bib
Attention Can Reflect Syntactic Structure (If You Let It)
Vinit Ravishankar | Artur Kulmizev | Mostafa Abdou | Anders Søgaard | Joakim Nivre
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Since the popularization of the Transformer as a general-purpose feature encoder for NLP, many studies have attempted to decode linguistic structure from its novel multi-head attention mechanism. However, much of such work focused almost exclusively on English — a language with rigid word order and a lack of inflectional morphology. In this study, we present decoding experiments for multilingual BERT across 18 languages in order to test the generalizability of the claim that dependency syntax is reflected in attention patterns. We show that full trees can be decoded above baseline accuracy from single attention heads, and that individual relations are often tracked by the same heads across languages. Furthermore, in an attempt to address recent debates about the status of attention as an explanatory mechanism, we experiment with fine-tuning mBERT on a supervised parsing objective while freezing different series of parameters. Interestingly, in steering the objective to learn explicit linguistic structure, we find much of the same structure represented in the resulting attention patterns, with interesting differences with respect to which parameters are frozen.

pdf bib
Spurious Correlations in Cross-Topic Argument Mining
Terne Sasha Thorn Jakobsen | Maria Barrett | Anders Søgaard
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

Recent work in cross-topic argument mining attempts to learn models that generalise across topics rather than merely relying on within-topic spurious correlations. We examine the effectiveness of this approach by analysing the output of single-task and multi-task models for cross-topic argument mining, through a combination of linear approximations of their decision boundaries, manual feature grouping, challenge examples, and ablations across the input vocabulary. Surprisingly, we show that cross-topic models still rely mostly on spurious correlations and only generalise within closely related topics, e.g., a model trained only on closed-class words and a few common open-class words outperforms a state-of-the-art cross-topic model on distant target topics.

pdf bib
Replicating and Extending “Because Their Treebanks Leak”: Graph Isomorphism, Covariants, and Parser Performance
Mark Anderson | Anders Søgaard | Carlos Gómez-Rodríguez
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Søgaard (2020) obtained results suggesting the fraction of trees occurring in the test data isomorphic to trees in the training set accounts for a non-trivial variation in parser performance. Similar to other statistical analyses in NLP, the results were based on evaluating linear regressions. However, the study had methodological issues and was undertaken using a small sample size leading to unreliable results. We present a replication study in which we also bin sentences by length and find that only a small subset of sentences vary in performance with respect to graph isomorphism. Further, the correlation observed between parser performance and graph isomorphism in the wild disappears when controlling for covariants. However, in a controlled experiment, where covariants are kept fixed, we do observe a correlation. We suggest that conclusions drawn from statistical analyses like this need to be tempered and that controlled experiments can complement them by more readily teasing factors apart.

pdf bib
Minimax and Neyman–Pearson Meta-Learning for Outlier Languages
Edoardo Maria Ponti | Rahul Aralikatte | Disha Shrivastava | Siva Reddy | Anders Søgaard
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
On the Interaction of Belief Bias and Explanations
Ana Valeria González | Anna Rogers | Anders Søgaard
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Is the Lottery Fair? Evaluating Winning Tickets Across Demographics
Victor Petrén Bach Hansen | Anders Søgaard
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
John praised Mary because _he_? Implicit Causality Bias and Its Interaction with Explicit Cues in LMs
Yova Kementchedjhieva | Mark Anderson | Anders Søgaard
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Moses and the Character-Based Random Babbling Baseline: CoAStaL at AmericasNLP 2021 Shared Task
Marcel Bollmann | Rahul Aralikatte | Héctor Murrieta Bello | Daniel Hershcovich | Miryam de Lhoneux | Anders Søgaard
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas

We evaluated a range of neural machine translation techniques developed specifically for low-resource scenarios. Unsuccessfully. In the end, we submitted two runs: (i) a standard phrase-based model, and (ii) a random babbling baseline using character trigrams. We found that it was surprisingly hard to beat (i), in spite of this model being, in theory, a bad fit for polysynthetic languages; and more interestingly, that (ii) was better than several of the submitted systems, highlighting how difficult low-resource machine translation for polysynthetic languages is.

pdf bib
Guideline Bias in Wizard-of-Oz Dialogues
Victor Petrén Bach Hansen | Anders Søgaard
Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future

NLP models struggle with generalization due to sampling and annotator bias. This paper focuses on a different kind of bias that has received very little attention: guideline bias, i.e., the bias introduced by how our annotator guidelines are formulated. We examine two recently introduced dialogue datasets, CCPE-M and Taskmaster-1, both collected by trained assistants in a Wizard-of-Oz set-up. For CCPE-M, we show how a simple lexical bias for the word like in the guidelines biases the data collection. This bias, in effect, leads to poor performance on data without this bias: a preference elicitation architecture based on BERT suffers a 5.3% absolute drop in performance, when like is replaced with a synonymous phrase, and a 13.2% drop in performance when evaluated on out-of-sample data. For Taskmaster-1, we show how the order in which instructions are resented, biases the data collection.

2020

pdf bib
Model-based Annotation of Coreference
Rahul Aralikatte | Anders Søgaard
Proceedings of the 12th Language Resources and Evaluation Conference

Humans do not make inferences over texts, but over models of what texts are about. When annotators are asked to annotate coreferent spans of text, it is therefore a somewhat unnatural task. This paper presents an alternative in which we preprocess documents, linking entities to a knowledge base, and turn the coreference annotation task – in our case limited to pronouns – into an annotation task where annotators are asked to assign pronouns to entities. Model-based annotation is shown to lead to faster annotation and higher inter-annotator agreement, and we argue that it also opens up an alternative approach to coreference resolution. We present two new coreference benchmark datasets, for English Wikipedia and English teacher-student dialogues, and evaluate state-of-the-art coreference resolvers on them.

pdf bib
WikiBank: Using Wikidata to Improve Multilingual Frame-Semantic Parsing
Cezar Sas | Meriem Beloucif | Anders Søgaard
Proceedings of the 12th Language Resources and Evaluation Conference

Frame-semantic annotations exist for a tiny fraction of the world’s languages, Wikidata, however, links knowledge base triples to texts in many languages, providing a common, distant supervision signal for semantic parsers. We present WikiBank, a multilingual resource of partial semantic structures that can be used to extend pre-existing resources rather than creating new man-made resources from scratch. We also integrate this form of supervision into an off-the-shelf frame-semantic parser and allow cross-lingual transfer. Using Google’s Sling architecture, we show significant improvements on the English and Spanish CoNLL 2009 datasets, whether training on the full available datasets or small subsamples thereof.

pdf bib
DaNE: A Named Entity Resource for Danish
Rasmus Hvingelby | Amalie Brogaard Pauli | Maria Barrett | Christina Rosted | Lasse Malm Lidegaard | Anders Søgaard
Proceedings of the 12th Language Resources and Evaluation Conference

We present a named entity annotation for the Danish Universal Dependencies treebank using the CoNLL-2003 annotation scheme: DaNE. It is the largest publicly available, Danish named entity gold annotation. We evaluate the quality of our annotations intrinsically by double annotating the entire treebank and extrinsically by comparing our annotations to a recently released named entity annotation of the validation and test sections of the Danish Universal Dependencies treebank. We benchmark the new resource by training and evaluating competitive architectures for supervised named entity recognition (NER), including FLAIR, monolingual (Danish) BERT and multilingual BERT. We explore cross-lingual transfer in multilingual BERT from five related languages in zero-shot and direct transfer setups, and we show that even with our modestly-sized training set, we improve Danish NER over a recent cross-lingual approach, as well as over zero-shot transfer from five related languages. Using multilingual BERT, we achieve higher performance by fine-tuning on both DaNE and a larger Bokmål (Norwegian) training set compared to only using DaNE. However, the highest performance isachieved by using a Danish BERT fine-tuned on DaNE. Our dataset enables improvements and applicability for Danish NER beyond cross-lingual methods. We employ a thorough error analysis of the predictions of the best models for seen and unseen entities, as well as their robustness on un-capitalized text. The annotated dataset and all the trained models are made publicly available.

pdf bib
Neural Speed Reading Audited
Anders Søgaard
Findings of the Association for Computational Linguistics: EMNLP 2020

Several approaches to neural speed reading have been presented at major NLP and machine learning conferences in 2017–20; i.e., “human-inspired” recurrent network architectures that learn to “read” text faster by skipping irrelevant words, typically optimizing the joint objective of minimizing classification error rate and FLOPs used at inference time. This paper reflects on the meaningfulness of the speed reading task, showing that (a) better and faster approaches to, say, document classification, already exist, which also learn to ignore part of the input (I give an example with 7% error reduction and a 136x speed-up over the state of the art in neural speed reading); and that (b) any claims that neural speed reading is “human-inspired”, are ill-founded.

pdf bib
The Sensitivity of Language Models and Humans to Winograd Schema Perturbations
Mostafa Abdou | Vinit Ravishankar | Maria Barrett | Yonatan Belinkov | Desmond Elliott | Anders Søgaard
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Large-scale pretrained language models are the major driving force behind recent improvements in perfromance on the Winograd Schema Challenge, a widely employed test of commonsense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones.

pdf bib
Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias
Ana Valeria González | Maria Barrett | Rasmus Hvingelby | Kellie Webster | Anders Søgaard
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The one-sided focus on English in previous studies of gender bias in NLP misses out on opportunities in other languages: English challenge datasets such as GAP and WinoGender highlight model preferences that are “hallucinatory”, e.g., disambiguating gender-ambiguous occurrences of ‘doctor’ as male doctors. We show that for languages with type B reflexivization, e.g., Swedish and Russian, we can construct multi-task challenge datasets for detecting gender bias that lead to unambiguously wrong model predictions: In these languages, the direct translation of ‘the doctor removed his mask’ is not ambiguous between a coreferential reading and a disjoint reading. Instead, the coreferential reading requires a non-gendered pronoun, and the gendered, possessive pronouns are anti-reflexive. We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks and focuses only on this phenomenon. We find evidence for gender bias across all task-language combinations and correlate model bias with national labor market statistics.

pdf bib
Some Languages Seem Easier to Parse Because Their Treebanks Leak
Anders Søgaard
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Cross-language differences in (universal) dependency parsing performance are mostly attributed to treebank size, average sentence length, average dependency length, morphological complexity, and domain differences. We point at a factor not previously discussed: If we abstract away from words and dependency labels, how many graphs in the test data were seen in the training data? We compute graph isomorphisms, and show that, treebank size aside, overlap between training and test graphs explain more of the observed variation than standard explanations such as the above.

pdf bib
Are All Good Word Vector Spaces Isomorphic?
Ivan Vulić | Sebastian Ruder | Anders Søgaard
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing algorithms for aligning cross-lingual word vector spaces assume that vector spaces are approximately isomorphic. As a result, they perform poorly or fail completely on non-isomorphic spaces. Such non-isomorphism has been hypothesised to result from typological differences between languages. In this work, we ask whether non-isomorphism is also crucially a sign of degenerate word vector spaces. We present a series of experiments across diverse languages which show that variance in performance across language pairs is not only due to typological differences, but can mostly be attributed to the size of the monolingual resources available, and to the properties and duration of monolingual training (e.g. “under-training”).

pdf bib
Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses
Simon Flachs | Ophélie Lacroix | Helen Yannakoudakis | Marek Rei | Anders Søgaard
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a challenge to state-of-the-art GEC systems. We demonstrate that a factor behind this is the inability of systems to rely on a strong internal language model in low error density domains. We hope this work shall facilitate the development of open-domain GEC models that generalize to different topics and genres.

pdf bib
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies
Gosse Bouma | Yuji Matsumoto | Stephan Oepen | Kenji Sagae | Djamé Seddah | Weiwei Sun | Anders Søgaard | Reut Tsarfaty | Dan Zeman
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

2019

pdf bib
A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages
Clara Vania | Yova Kementchedjhieva | Anders Søgaard | Adam Lopez
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Parsers are available for only a handful of the world’s languages, since they require lots of training data. How far can we get with just a small amount of training data? We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration. Experimenting on three typologically diverse low-resource languages—North Sámi, Galician, and Kazah—We find that (1) when only the low-resource treebank is available, data augmentation is very helpful; (2) when a related high-resource treebank is available, cross-lingual training is helpful and complements data augmentation; and (3) when the high-resource treebank uses a different writing system, transliteration into a shared orthographic spaces is also very helpful.

pdf bib
Rewarding Coreference Resolvers for Being Consistent with World Knowledge
Rahul Aralikatte | Heather Lent | Ana Valeria Gonzalez | Daniel Herschcovich | Chen Qiu | Anders Sandholm | Michael Ringaard | Anders Søgaard
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Unresolved coreference is a bottleneck for relation extraction, and high-quality coreference resolvers may produce an output that makes it a lot easier to extract knowledge triples. We show how to improve coreference resolvers by forwarding their input to a relation extraction system and reward the resolvers for producing triples that are found in knowledge bases. Since relation extraction systems can rely on different forms of supervision and be biased in different ways, we obtain the best performance, improving over the state of the art, using multi-task reinforcement learning.

pdf bib
Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction
Yova Kementchedjhieva | Mareike Hartmann | Anders Søgaard
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The task of bilingual dictionary induction (BDI) is commonly used for intrinsic evaluation of cross-lingual word embeddings. The largest dataset for BDI was generated automatically, so its quality is dubious. We study the composition and quality of the test sets for five diverse languages from this dataset, with concerning findings: (1) a quarter of the data consists of proper nouns, which can be hardly indicative of BDI performance, and (2) there are pervasive gaps in the gold-standard targets. These issues appear to affect the ranking between cross-lingual embedding systems on individual languages, and the overall degree to which the systems differ in performance. With proper nouns removed from the data, the margin between the top two systems included in the study grows from 3.4% to 17.2%. Manual verification of the predictions, on the other hand, reveals that gaps in the gold standard targets artificially inflate the margin between the two systems on English to Bulgarian BDI from 0.1% to 6.7%. We thus suggest that future research either avoids drawing conclusions from quantitative results on this BDI dataset, or accompanies such evaluation with rigorous error analysis.

pdf bib
Higher-order Comparisons of Sentence Encoder Representations
Mostafa Abdou | Artur Kulmizev | Felix Hill | Daniel M. Low | Anders Søgaard
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Representational Similarity Analysis (RSA) is a technique developed by neuroscientists for comparing activity patterns of different measurement modalities (e.g., fMRI, electrophysiology, behavior). As a framework, RSA has several advantages over existing approaches to interpretation of language encoders based on probing or diagnostic classification: namely, it does not require large training samples, is not prone to overfitting, and it enables a more transparent comparison between the representational geometries of different models and modalities. We demonstrate the utility of RSA by establishing a previously unknown correspondence between widely-employed pretrained language encoders and human processing difficulty via eye-tracking data, showcasing its potential in the interpretability toolbox for neural models.

pdf bib
Adversarial Removal of Demographic Attributes Revisited
Maria Barrett | Yova Kementchedjhieva | Yanai Elazar | Desmond Elliott | Anders Søgaard
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.

pdf bib
Few-Shot and Zero-Shot Learning for Historical Text Normalization
Marcel Bollmann | Natalia Korchagina | Anders Søgaard
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Historical text normalization often relies on small training datasets. Recent work has shown that multi-task learning can lead to significant improvements by exploiting synergies with related datasets, but there has been no systematic study of different multi-task learning architectures. This paper evaluates 63 multi-task learning configurations for sequence-to-sequence-based historical text normalization across ten datasets from eight languages, using autoencoding, grapheme-to-phoneme mapping, and lemmatization as auxiliary tasks. We observe consistent, significant improvements across languages when training data for the target task is limited, but minimal or no improvements when training data is abundant. We also show that zero-shot learning outperforms the simple, but relatively strong, identity baseline.

pdf bib
X-WikiRE: A Large, Multilingual Resource for Relation Extraction as Machine Comprehension
Mostafa Abdou | Cezar Sas | Rahul Aralikatte | Isabelle Augenstein | Anders Søgaard
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Although the vast majority of knowledge bases (KBs) are heavily biased towards English, Wikipedias do cover very different topics in different languages. Exploiting this, we introduce a new multilingual dataset (X-WikiRE), framing relation extraction as a multilingual machine reading problem. We show that by leveraging this resource it is possible to robustly transfer models cross-lingually and that multilingual support significantly improves (zero-shot) relation extraction, enabling the population of low-resourced KBs from their well-populated counterparts.

pdf bib
Issue Framing in Online Discussion Fora
Mareike Hartmann | Tallulah Jansen | Isabelle Augenstein | Anders Søgaard
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In online discussion fora, speakers often make arguments for or against something, say birth control, by highlighting certain aspects of the topic. In social science, this is referred to as issue framing. In this paper, we introduce a new issue frame annotated corpus of online discussions. We explore to what extent models trained to detect issue frames in newswire and social media can be transferred to the domain of discussion fora, using a combination of multi-task and adversarial training, assuming only unlabeled training data in the target domain.

pdf bib
A Simple and Robust Approach to Detecting Subject-Verb Agreement Errors
Simon Flachs | Ophélie Lacroix | Marek Rei | Helen Yannakoudakis | Anders Søgaard
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

While rule-based detection of subject-verb agreement (SVA) errors is sensitive to syntactic parsing errors and irregularities and exceptions to the main rules, neural sequential labelers have a tendency to overfit their training data. We observe that rule-based error generation is less sensitive to syntactic parsing errors and irregularities than error detection and explore a simple, yet efficient approach to getting the best of both worlds: We train neural sequential labelers on the combination of large volumes of silver standard data, obtained through rule-based error generation, and gold standard data. We show that our simple protocol leads to more robust detection of SVA errors on both in-domain and out-of-domain data, as well as in the context of other errors and long-distance dependencies; and across four standard benchmarks, the induced model on average achieves a new state of the art.

pdf bib
Better, Faster, Stronger Sequence Tagging Constituent Parsers
David Vilares | Mostafa Abdou | Anders Søgaard
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Sequence tagging models for constituent parsing are faster, but less accurate than other types of parsers. In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding. To effectively close brackets, we train a model that learns to switch between tagging schemes. To reduce sparsity, we decompose the label set and use multi-task learning to jointly learn to predict sublabels. Finally, we mitigate issues from greedy decoding through auxiliary losses and sentence-level fine-tuning with policy gradient. Combining these techniques, we clearly surpass the performance of sequence tagging constituent parsers on the English and Chinese Penn Treebanks, and reduce their parsing time even further. On the SPMRL datasets, we observe even greater improvements across the board, including a new state of the art on Basque, Hebrew, Polish and Swedish.

pdf bib
Noisy Channel for Low Resource Grammatical Error Correction
Simon Flachs | Ophélie Lacroix | Anders Søgaard
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper describes our contribution to the low-resource track of the BEA 2019 shared task on Grammatical Error Correction (GEC). Our approach to GEC builds on the theory of the noisy channel by combining a channel model and language model. We generate confusion sets from the Wikipedia edit history and use the frequencies of edits to estimate the channel model. Additionally, we use two pre-trained language models: 1) Google’s BERT model, which we fine-tune for specific error types and 2) OpenAI’s GPT-2 model, utilizing that it can operate with previous sentences as context. Furthermore, we search for the optimal combinations of corrections using beam search.

pdf bib
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing
Joakim Nivre | Leon Derczynski | Filip Ginter | Bjørn Lindi | Stephan Oepen | Anders Søgaard | Jörg Tidemann
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

pdf bib
CoAStaL at SemEval-2019 Task 3: Affect Classification in Dialogue using Attentive BiLSTMs
Ana Valeria González | Victor Petrén Bach Hansen | Joachim Bingel | Anders Søgaard
Proceedings of the 13th International Workshop on Semantic Evaluation

This work describes the system presented by the CoAStaL Natural Language Processing group at University of Copenhagen. The main system we present uses the same attention mechanism presented in (Yang et al., 2016). Our overall model architecture is also inspired by their hierarchical classification model and adapted to deal with classification in dialogue by encoding information at the turn level. We use different encodings for each turn to create a more expressive representation of dialogue context which is then fed into our classifier.We also define a custom preprocessing step in order to deal with language commonly used in interactions across many social media outlets. Our proposed system achieves a micro F1 score of 0.7340 on the test set and shows significant gains in performance compared to a system using dialogue level encoding.

pdf bib
Historical Text Normalization with Delayed Rewards
Simon Flachs | Marcel Bollmann | Anders Søgaard
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Training neural sequence-to-sequence models with simple token-level log-likelihood is now a standard approach to historical text normalization, albeit often outperformed by phrase-based models. Policy gradient training enables direct optimization for exact matches, and while the small datasets in historical text normalization are prohibitive of from-scratch reinforcement learning, we show that policy gradient fine-tuning leads to significant improvements across the board. Policy gradient training, in particular, leads to more accurate normalizations for long or unseen words.

pdf bib
Multi-Task Semantic Dependency Parsing with Policy Gradient for Learning Easy-First Strategies
Shuhei Kurita | Anders Søgaard
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In Semantic Dependency Parsing (SDP), semantic relations form directed acyclic graphs, rather than trees. We propose a new iterative predicate selection (IPS) algorithm for SDP. Our IPS algorithm combines the graph-based and transition-based parsing approaches in order to handle multiple semantic head words. We train the IPS model using a combination of multi-task learning and task-specific policy gradient training. Trained this way, IPS achieves a new state of the art on the SemEval 2015 Task 18 datasets. Furthermore, we observe that policy gradient training learns an easy-first strategy.

pdf bib
Unsupervised Cross-Lingual Representation Learning
Sebastian Ruder | Anders Søgaard | Ivan Vulić
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

In this tutorial, we provide a comprehensive survey of the exciting recent work on cutting-edge weakly-supervised and unsupervised cross-lingual word representations. After providing a brief history of supervised cross-lingual word representations, we focus on: 1) how to induce weakly-supervised and unsupervised cross-lingual word representations in truly resource-poor settings where bilingual supervision cannot be guaranteed; 2) critical examinations of different training conditions and requirements under which unsupervised algorithms can and cannot work effectively; 3) more robust methods for distant language pairs that can mitigate instability issues and low performance for distant language pairs; 4) how to comprehensively evaluate such representations; and 5) diverse applications that benefit from cross-lingual word representations (e.g., MT, dialogue, cross-lingual sequence labeling and structured prediction applications, cross-lingual IR).

pdf bib
Naive Regularizers for Low-Resource Neural Machine Translation
Meriem Beloucif | Ana Valeria Gonzalez | Marcel Bollmann | Anders Søgaard
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. Neural models have to be trained on large amounts of data and have been shown to perform poorly when only limited data is available. We show that using naive regularization methods, based on sentence length, punctuation and word frequencies, to penalize translations that are very different from the input sentences, consistently improves the translation quality across multiple low-resource languages. We experiment with 12 language pairs, varying the training data size between 17k to 230k sentence pairs. Our best regularizer achieves an average increase of 1.5 BLEU score and 1.0 TER score across all the language pairs. For example, we achieve a BLEU score of 26.70 on the IWSLT15 English–Vietnamese translation task simply by using relative differences in punctuation as a regularizer.

2018

pdf bib
Lexi: A tool for adaptive, personalized text simplification
Joachim Bingel | Gustavo Paetzold | Anders Søgaard
Proceedings of the 27th International Conference on Computational Linguistics

Most previous research in text simplification has aimed to develop generic solutions, assuming very homogeneous target audiences with consistent intra-group simplification needs. We argue that this assumption does not hold, and that instead we need to develop simplification systems that adapt to the individual needs of specific users. As a first step towards personalized simplification, we propose a framework for adaptive lexical simplification and introduce Lexi, a free open-source and easily extensible tool for adaptive, personalized text simplification. Lexi is easily installed as a browser extension, enabling easy access to the service for its users.

pdf bib
Zero-Shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens
Marek Rei | Anders Søgaard
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Can attention- or gradient-based visualization techniques be used to infer token-level labels for binary sequence tagging problems, using networks trained only on sentence-level labels? We construct a neural network architecture based on soft attention, train it as a binary sentence classifier and evaluate against token-level annotation on four different datasets. Inferring token labels from a network provides a method for quantitatively evaluating what the model is learning, along with generating useful feedback in assistance systems. Our results indicate that attention-based methods are able to predict token-level labels more accurately, compared to gradient-based methods, sometimes even rivaling the supervised oracle network.

pdf bib
Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces
Isabelle Augenstein | Sebastian Ruder | Anders Søgaard
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new state of the art for aspect-based and topic-based sentiment analysis.

pdf bib
Unsupervised Induction of Linguistic Categories with Records of Reading, Speaking, and Writing
Maria Barrett | Ana Valeria González-Garduño | Lea Frermann | Anders Søgaard
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

When learning POS taggers and syntactic chunkers for low-resource languages, different resources may be available, and often all we have is a small tag dictionary, motivating type-constrained unsupervised induction. Even small dictionaries can improve the performance of unsupervised induction algorithms. This paper shows that performance can be further improved by including data that is readily available or can be easily obtained for most languages, i.e., eye-tracking, speech, or keystroke logs (or any combination thereof). We project information from all these data sources into shared spaces, in which the union of words is represented. For English unsupervised POS induction, the additional information, which is not required at test time, leads to an average error reduction on Ontonotes domains of 1.5% over systems augmented with state-of-the-art word embeddings. On Penn Treebank the best model achieves 5.4% error reduction over a word embeddings baseline. We also achieve significant improvements for syntactic chunk induction. Our analysis shows that improvements are even bigger when the available tag dictionaries are smaller.

pdf bib
Sluice Resolution without Hand-Crafted Features over Brittle Syntax Trees
Ola Rønning | Daniel Hardt | Anders Søgaard
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Sluice resolution in English is the problem of finding antecedents of wh-fronted ellipses. Previous work has relied on hand-crafted features over syntax trees that scale poorly to other languages and domains; in particular, to dialogue, which is one of the most interesting applications of sluice resolution. Syntactic information is arguably important for sluice resolution, but we show that multi-task learning with partial parsing as auxiliary tasks effectively closes the gap and buys us an additional 9% error reduction over previous work. Since we are not directly relying on features from partial parsers, our system is more robust to domain shifts, giving a 26% error reduction on embedded sluices in dialogue.

pdf bib
Generalizing Procrustes Analysis for Better Bilingual Dictionary Induction
Yova Kementchedjhieva | Sebastian Ruder | Ryan Cotterell | Anders Søgaard
Proceedings of the 22nd Conference on Computational Natural Language Learning

Most recent approaches to bilingual dictionary induction find a linear alignment between the word vector spaces of two languages. We show that projecting the two languages onto a third, latent space, rather than directly onto each other, while equivalent in terms of expressivity, makes it easier to learn approximate alignments. Our modified approach also allows for supporting languages to be included in the alignment process, to obtain an even better performance in low resource settings.

pdf bib
Sequence Classification with Human Attention
Maria Barrett | Joachim Bingel | Nora Hollenstein | Marek Rei | Anders Søgaard
Proceedings of the 22nd Conference on Computational Natural Language Learning

Learning attention functions requires large volumes of data, but many NLP tasks simulate human behavior, and in this paper, we show that human attention really does provide a good inductive bias on many attention functions in NLP. Specifically, we use estimated human attention derived from eye-tracking corpora to regularize attention functions in recurrent neural networks. We show substantial improvements across a range of tasks, including sentiment analysis, grammatical error detection, and detection of abusive language.

pdf bib
A Danish FrameNet Lexicon and an Annotated Corpus Used for Training and Evaluating a Semantic Frame Classifier
Bolette Pedersen | Sanni Nimb | Anders Søgaard | Mareike Hartmann | Sussi Olsen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
On the Limitations of Unsupervised Bilingual Dictionary Induction
Anders Søgaard | Sebastian Ruder | Ivan Vulić
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Unsupervised machine translation - i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora - seems impossible, but nevertheless, Lample et al. (2017) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised cross-lingual word embedding technique for bilingual dictionary induction (Conneau et al., 2017), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.

pdf bib
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP
Georgiana Dinu | Miguel Ballesteros | Avirup Sil | Sam Bowman | Wael Hamza | Anders Sogaard | Tahira Naseem | Yoav Goldberg
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP

pdf bib
Limitations of Cross-Lingual Learning from Image Search
Mareike Hartmann | Anders Søgaard
Proceedings of The Third Workshop on Representation Learning for NLP

Cross-lingual representation learning is an important step in making NLP scale to all the world’s languages. Previous work on bilingual lexicon induction suggests that it is possible to learn cross-lingual representations of words based on similarities between images associated with these words. However, that work focused (almost exclusively) on the translation of nouns only. Here, we investigate whether the meaning of other parts-of-speech (POS), in particular adjectives and verbs, can be learned in the same way. Our experiments across five language pairs indicate that previous work does not scale to the problem of learning cross-lingual representations beyond simple nouns.

pdf bib
Character-level Supervision for Low-resource POS Tagging
Katharina Kann | Johannes Bjerva | Isabelle Augenstein | Barbara Plank | Anders Søgaard
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

Neural part-of-speech (POS) taggers are known to not perform well with little training data. As a step towards overcoming this problem, we present an architecture for learning more robust neural POS taggers by jointly training a hierarchical, recurrent model and a recurrent character-based sequence-to-sequence network supervised using an auxiliary objective. This way, we introduce stronger character-level supervision into the model, which enables better generalization to unseen words and provides regularization, making our encoding less prone to overfitting. We experiment with three auxiliary tasks: lemmatization, character-based word autoencoding, and character-based random string autoencoding. Experiments with minimal amounts of labeled data on 34 languages show that our new architecture outperforms a single-task baseline and, surprisingly, that, on average, raw text autoencoding can be as beneficial for low-resource POS tagging as using lemma information. Our neural POS tagger closes the gap to a state-of-the-art POS tagger (MarMoT) for low-resource scenarios by 43%, even outperforming it on languages with templatic morphology, e.g., Arabic, Hebrew, and Turkish, by some margin.

pdf bib
Multi-task learning for historical text normalization: Size matters
Marcel Bollmann | Anders Søgaard | Joachim Bingel
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

Historical text normalization suffers from small datasets that exhibit high variance, and previous work has shown that multi-task learning can be used to leverage data from related problems in order to obtain more robust models. Previous work has been limited to datasets from a specific language and a specific historical period, and it is not clear whether results generalize. It therefore remains an open problem, when historical text normalization benefits from multi-task learning. We explore the benefits of multi-task learning across 10 different datasets, representing different languages and periods. Our main finding—contrary to what has been observed for other NLP tasks—is that multi-task learning mainly works when target task data is very scarce.

pdf bib
When does deep multi-task learning work for loosely related document classification tasks?
Emma Kerinec | Chloé Braud | Anders Søgaard
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

This work aims to contribute to our understanding of when multi-task learning through parameter sharing in deep neural networks leads to improvements over single-task learning. We focus on the setting of learning from loosely related tasks, for which no theoretical guarantees exist. We therefore approach the question empirically, studying which properties of datasets and single-task learning characteristics correlate with improvements from multi-task learning. We are the first to study this in a text classification setting and across more than 500 different task pairs.

pdf bib
Nightmare at test time: How punctuation prevents parsers from generalizing
Anders Søgaard | Miryam de Lhoneux | Isabelle Augenstein
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Punctuation is a strong indicator of syntactic structure, and parsers trained on text with punctuation often rely heavily on this signal. Punctuation is a diversion, however, since human language processing does not rely on punctuation to the same extent, and in informal texts, we therefore often leave out punctuation. We also use punctuation ungrammatically for emphatic or creative purposes, or simply by mistake. We show that (a) dependency parsers are sensitive to both absence of punctuation and to alternative uses; (b) neural parsers tend to be more sensitive than vintage parsers; (c) training neural parsers without punctuation outperforms all out-of-the-box parsers across all scenarios where punctuation departs from standard punctuation. Our main experiments are on synthetically corrupted data to study the effect of punctuation in isolation and avoid potential confounds, but we also show effects on out-of-domain data.

pdf bib
Linguistic representations in multi-task neural networks for ellipsis resolution
Ola Rønning | Daniel Hardt | Anders Søgaard
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Sluicing resolution is the task of identifying the antecedent to a question ellipsis. Antecedents are often sentential constituents, and previous work has therefore relied on syntactic parsing, together with complex linguistic features. A recent model instead used partial parsing as an auxiliary task in sequential neural network architectures to inject syntactic information. We explore the linguistic information being brought to bear by such networks, both by defining subsets of the data exhibiting relevant linguistic characteristics, and by examining the internal representations of the network. Both perspectives provide evidence for substantial linguistic knowledge being deployed by the neural networks.

pdf bib
Sentiment analysis under temporal shift
Jan Lukes | Anders Søgaard
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Sentiment analysis models often rely on training data that is several years old. In this paper, we show that lexical features change polarity over time, leading to degrading performance. This effect is particularly strong in sparse models relying only on highly predictive features. Using predictive feature selection, we are able to significantly improve the accuracy of such models over time.

pdf bib
A Discriminative Latent-Variable Model for Bilingual Lexicon Induction
Sebastian Ruder | Ryan Cotterell | Yova Kementchedjhieva | Anders Søgaard
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We introduce a novel discriminative latent-variable model for the task of bilingual lexicon induction. Our model combines the bipartite matching dictionary prior of Haghighi et al. (2008) with a state-of-the-art embedding-based approach. To train the model, we derive an efficient Viterbi EM algorithm. We provide empirical improvements on six language pairs under two metrics and show that the prior theoretically and empirically helps to mitigate the hubness problem. We also demonstrate how previous work may be viewed as a similarly fashioned latent-variable model, albeit with a different prior.

pdf bib
Why is unsupervised alignment of English embeddings from different algorithms so hard?
Mareike Hartmann | Yova Kementchedjhieva | Anders Søgaard
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper presents a challenge to the community: Generative adversarial networks (GANs) can perfectly align independent English word embeddings induced using the same algorithm, based on distributional information alone; but fails to do so, for two different embeddings algorithms. Why is that? We believe understanding why, is key to understand both modern word embedding algorithms and the limitations and instability dynamics of GANs. This paper shows that (a) in all these cases, where alignment fails, there exists a linear transform between the two embeddings (so algorithm biases do not lead to non-linear differences), and (b) similar effects can not easily be obtained by varying hyper-parameters. One plausible suggestion based on our initial experiments is that the differences in the inductive biases of the embedding algorithms lead to an optimization landscape that is riddled with local optima, leading to a very small basin of convergence, but we present this more as a challenge paper than a technical contribution.

pdf bib
A strong baseline for question relevancy ranking
Ana Gonzalez | Isabelle Augenstein | Anders Søgaard
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks – a task that amounts to question relevancy ranking – involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google’s search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.

pdf bib
Parameter sharing between dependency parsers for related languages
Miryam de Lhoneux | Johannes Bjerva | Isabelle Augenstein | Anders Søgaard
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to better performance, but there is no consensus on what parameters to share. We present an evaluation of 27 different parameter sharing strategies across 10 languages, representing five pairs of related languages, each pair from a different language family. We find that sharing transition classifier parameters always helps, whereas the usefulness of sharing word and/or character LSTM parameters varies. Based on this result, we propose an architecture where the transition classifier is shared, and the sharing of word and character parameters is controlled by a parameter that can be tuned on validation data. This model is linguistically motivated and obtains significant improvements over a monolingually trained baseline. We also find that sharing transition classifier parameters helps when training a parser on unrelated language pairs, but we find that, in the case of unrelated languages, sharing too many parameters does not help.

2017

pdf bib
Evaluating hypotheses in geolocation on a very large sample of Twitter
Bahar Salehi | Anders Søgaard
Proceedings of the 3rd Workshop on Noisy User-generated Text

Recent work in geolocation has made several hypotheses about what linguistic markers are relevant to detect where people write from. In this paper, we examine six hypotheses against a corpus consisting of all geo-tagged tweets from the US, or whose geo-tags could be inferred, in a 19% sample of Twitter history. Our experiments lend support to all six hypotheses, including that spelling variants and hashtags are strong predictors of location. We also study what kinds of common nouns are predictive of location after controlling for named entities such as dolphins or sharks

pdf bib
Huntsville, hospitals, and hockey teams: Names can reveal your location
Bahar Salehi | Dirk Hovy | Eduard Hovy | Anders Søgaard
Proceedings of the 3rd Workshop on Noisy User-generated Text

Geolocation is the task of identifying a social media user’s primary location, and in natural language processing, there is a growing literature on to what extent automated analysis of social media posts can help. However, not all content features are equally revealing of a user’s location. In this paper, we evaluate nine name entity (NE) types. Using various metrics, we find that GEO-LOC, FACILITY and SPORT-TEAM are more informative for geolocation than other NE types. Using these types, we improve geolocation accuracy and reduce distance error over various famous text-based methods.

pdf bib
Is writing style predictive of scientific fraud?
Chloé Braud | Anders Søgaard
Proceedings of the Workshop on Stylistic Variation

The problem of detecting scientific fraud using machine learning was recently introduced, with initial, positive results from a model taking into account various general indicators. The results seem to suggest that writing style is predictive of scientific fraud. We revisit these initial experiments, and show that the leave-one-out testing procedure they used likely leads to a slight over-estimate of the predictability, but also that simple models can outperform their proposed model by some margin. We go on to explore more abstract linguistic features, such as linguistic complexity and discourse structure, only to obtain negative results. Upon analyzing our models, we do see some interesting patterns, though: Scientific fraud, for examples, contains less comparison, as well as different types of hedging and ways of presenting logical reasoning.

pdf bib
Using Gaze to Predict Text Readability
Ana Valeria González-Garduño | Anders Søgaard
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

We show that text readability prediction improves significantly from hard parameter sharing with models predicting first pass duration, total fixation duration and regression duration. Specifically, we induce multi-task Multilayer Perceptrons and Logistic Regression models over sentence representations that capture various aggregate statistics, from two different text readability corpora for English, as well as the Dundee eye-tracking corpus. Our approach leads to significant improvements over Single task learning and over previous systems. In addition, our improvements are consistent across train sample sizes, making our approach especially applicable to small datasets.

pdf bib
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP
Samuel Bowman | Yoav Goldberg | Felix Hill | Angeliki Lazaridou | Omer Levy | Roi Reichart | Anders Søgaard
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP

pdf bib
Using hyperlinks to improve multilingual partial parsers
Anders Søgaard
Proceedings of the 15th International Conference on Parsing Technologies

Syntactic annotation is costly and not available for the vast majority of the world’s languages. We show that sometimes we can do away with less labeled data by exploiting more readily available forms of mark-up. Specifically, we revisit an idea from Valentin Spitkovsky’s work (2010), namely that hyperlinks typically bracket syntactic constituents or chunks. We strengthen his results by showing that not only can hyperlinks help in low resource scenarios, exemplified here by Quechua, but learning from hyperlinks can also improve state-of-the-art NLP models for English newswire. We also present out-of-domain evaluation on English Ontonotes 4.0.

pdf bib
What I think when I think about treebanks
Anders Søgaard
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

pdf bib
Learning attention for historical text normalization by learning to pronounce
Marcel Bollmann | Joachim Bingel | Anders Søgaard
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-the-art by an absolute 2% increase in performance. We analyze the induced models across 44 different texts from Early New High German. Interestingly, we observe that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms. This, we believe, is an important step toward understanding how MTL works.

pdf bib
Cross-lingual and cross-domain discourse segmentation of entire documents
Chloé Braud | Ophélie Lacroix | Anders Søgaard
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Discourse segmentation is a crucial step in building end-to-end discourse parsers. However, discourse segmenters only exist for a few languages and domains. Typically they only detect intra-sentential segment boundaries, assuming gold standard sentence and token segmentation, and relying on high-quality syntactic parses and rich heuristics that are not generally available across languages and domains. In this paper, we propose statistical discourse segmenters for five languages and three domains that do not rely on gold pre-annotations. We also consider the problem of learning discourse segmenters when no labeled data is available for a language. Our fully supervised system obtains 89.5% F1 for English newswire, with slight drops in performance on other domains, and we report supervised and unsupervised (cross-lingual) results for five languages in total.

pdf bib
Multi-Task Learning of Keyphrase Boundary Classification
Isabelle Augenstein | Anders Søgaard
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to predefined types. Although important in practice, this task is so far underexplored, partly due to the lack of labelled data. To overcome this, we explore several auxiliary tasks, including semantic super-sense tagging and identification of multi-word expressions, and cast the task as a multi-task learning problem with deep recurrent neural networks. Our multi-task models perform significantly better than previous state of the art approaches on two scientific KBC datasets, particularly for long keyphrases.

pdf bib
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
Bjarke Felbo | Alan Mislove | Anders Søgaard | Iyad Rahwan | Sune Lehmann
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-the-art performance on 8 benchmark datasets within emotion, sentiment and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches.

pdf bib
Does syntax help discourse segmentation? Not so much
Chloé Braud | Ophélie Lacroix | Anders Søgaard
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Discourse segmentation is the first step in building discourse parsers. Most work on discourse segmentation does not scale to real-world discourse parsing across languages, for two reasons: (i) models rely on constituent trees, and (ii) experiments have relied on gold standard identification of sentence and token boundaries. We therefore investigate to what extent constituents can be replaced with universal dependencies, or left out completely, as well as how state-of-the-art segmenters fare in the absence of sentence boundaries. Our results show that dependency information is less useful than expected, but we provide a fully scalable, robust model that only relies on part-of-speech information, and show that it performs well across languages in the absence of any gold-standard annotation.

pdf bib
Cross-Lingual Word Representations: Induction and Evaluation
Manaal Faruqui | Anders Søgaard | Ivan Vulić
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

In recent past, NLP as a field has seen tremendous utility of distributional word vector representations as features in downstream tasks. The fact that these word vectors can be trained on unlabeled monolingual corpora of a language makes them an inexpensive resource in NLP. With the increasing use of monolingual word vectors, there is a need for word vectors that can be used as efficiently across multiple languages as monolingually. Therefore, learning bilingual and multilingual word embeddings/vectors is currently an important research topic. These vectors offer an elegant and language-pair independent way to represent content across different languages.This tutorial aims to bring NLP researchers up to speed with the current techniques in cross-lingual word representation learning. We will first discuss how to induce cross-lingual word representations (covering both bilingual and multilingual ones) from various data types and resources (e.g., parallel data, comparable data, non-aligned monolingual data in different languages, dictionaries and theasuri, or, even, images, eye-tracking data). We will then discuss how to evaluate such representations, intrinsically and extrinsically. We will introduce researchers to state-of-the-art methods for constructing cross-lingual word representations and discuss their applicability in a broad range of downstream NLP applications.We will deliver a detailed survey of the current methods, discuss best training and evaluation practices and use-cases, and provide links to publicly available implementations, datasets, and pre-trained models.

pdf bib
Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages
Michael Schlichtkrull | Anders Søgaard
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In cross-lingual dependency annotation projection, information is often lost during transfer because of early decoding. We present an end-to-end graph-based neural network dependency parser that can be trained to reproduce matrices of edge scores, which can be directly projected across word alignments. We show that our approach to cross-lingual dependency parsing is not only simpler, but also achieves an absolute improvement of 2.25% averaged across 10 languages compared to the previous state of the art.

pdf bib
Parsing Universal Dependencies without training
Héctor Martínez Alonso | Željko Agić | Barbara Plank | Anders Søgaard
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We present UDP, the first training-free parser for Universal Dependencies (UD). Our algorithm is based on PageRank and a small set of specific dependency head rules. UDP features two-step decoding to guarantee that function words are attached as leaf nodes. The parser requires no training, and it is competitive with a delexicalized transfer system. UDP offers a linguistically sound unsupervised alternative to cross-lingual parsing for UD. The parser has very few parameters and distinctly robust to domain change across languages.

pdf bib
Cross-lingual RST Discourse Parsing
Chloé Braud | Maximin Coavoux | Anders Søgaard
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic theory, but differ slightly in the way documents are annotated. In this paper, we present (a) a new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, (b) a harmonization of discourse treebanks across languages, enabling us to present (c) what to the best of our knowledge are the first experiments on cross-lingual discourse parsing.

pdf bib
A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments
Omer Levy | Anders Søgaard | Yoav Goldberg
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

While cross-lingual word embeddings have been studied extensively in recent years, the qualitative differences between the different algorithms remain vague. We observe that whether or not an algorithm uses a particular feature set (sentence IDs) accounts for a significant performance gap among these algorithms. This feature set is also used by traditional alignment algorithms, such as IBM Model-1, which demonstrate similar performance to state-of-the-art embedding algorithms on a variety of benchmarks. Overall, we observe that different algorithmic approaches for utilizing the sentence ID feature space result in similar performance. This paper draws both empirical and theoretical parallels between the embedding and alignment literature, and suggests that adding additional sources of information, which go beyond the traditional signal of bilingual sentence-aligned corpora, may substantially improve cross-lingual word embeddings, and that future baselines should at least take such features into account.

pdf bib
Identifying beneficial task relations for multi-task learning in deep neural networks
Joachim Bingel | Anders Søgaard
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Multi-task learning (MTL) in deep neural networks for NLP has recently received increasing interest due to some compelling benefits, including its potential to efficiently regularize models and to reduce the need for labeled data. While it has brought significant improvements in a number of NLP tasks, mixed results have been reported, and little is known about the conditions under which MTL leads to gains in NLP. This paper sheds light on the specific task relations that can lead to gains from MTL models over single-task setups.

pdf bib
Cross-lingual tagger evaluation without test data
Željko Agić | Barbara Plank | Anders Søgaard
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We address the challenge of cross-lingual POS tagger evaluation in absence of manually annotated test data. We put forth and evaluate two dictionary-based metrics. On the tasks of accuracy prediction and system ranking, we reveal that these metrics are reliable enough to approximate test set-based evaluation, and at the same time lean enough to support assessment for truly low-resource languages.

2016

pdf bib
Extracting token-level signals of syntactic processing from fMRI - with an application to PoS induction
Joachim Bingel | Maria Barrett | Anders Søgaard
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Deep multi-task learning with low level tasks supervised at lower layers
Anders Søgaard | Yoav Goldberg
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Text Simplification as Tree Labeling
Joachim Bingel | Anders Søgaard
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss
Barbara Plank | Anders Søgaard | Yoav Goldberg
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Joint part-of-speech and dependency projection from multiple sources
Anders Johannsen | Željko Agić | Anders Søgaard
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Weakly Supervised Part-of-speech Tagging Using Eye-tracking Data
Maria Barrett | Joachim Bingel | Frank Keller | Anders Søgaard
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
The SemDaX Corpus ― Sense Annotations with Scalable Sense Inventories
Bolette Pedersen | Anna Braasch | Anders Johannsen | Héctor Martínez Alonso | Sanni Nimb | Sussi Olsen | Anders Søgaard | Nicolai Hartvig Sørensen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We launch the SemDaX corpus which is a recently completed Danish human-annotated corpus available through a CLARIN academic license. The corpus includes approx. 90,000 words, comprises six textual domains, and is annotated with sense inventories of different granularity. The aim of the developed corpus is twofold: i) to assess the reliability of the different sense annotation schemes for Danish measured by qualitative analyses and annotation agreement scores, and ii) to serve as training and test data for machine learning algorithms with the practical purpose of developing sense taggers for Danish. To these aims, we take a new approach to human-annotated corpus resources by double annotating a much larger part of the corpus than what is normally seen: for the all-words task we double annotated 60% of the material and for the lexical sample task 100%. We include in the corpus not only the adjucated files, but also the diverging annotations. In other words, we consider not all disagreement to be noise, but rather to contain valuable linguistic information that can help us improve our annotation schemes and our learning algorithms.

pdf bib
Evaluating word embeddings with fMRI and eye-tracking
Anders Søgaard
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

pdf bib
Learning a POS tagger for AAVE-like language
Anna Jørgensen | Dirk Hovy | Anders Søgaard
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Improving sentence compression by learning to predict gaze
Sigrid Klerke | Yoav Goldberg | Anders Søgaard
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Multilingual Projection for Parsing Truly Low-Resource Languages
Željko Agić | Anders Johannsen | Barbara Plank | Héctor Martínez Alonso | Natalie Schluter | Anders Søgaard
Transactions of the Association for Computational Linguistics, Volume 4

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.

pdf bib
Improving historical spelling normalization with bi-directional LSTMs and multi-task learning
Marcel Bollmann | Anders Søgaard
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Natural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the spelling of historical words to modern forms. We explore the suitability of a deep neural network architecture for this task, particularly a deep bi-LSTM network applied on a character level. Our model compares well to previously established normalization algorithms when evaluated on a diverse set of texts from Early New High German. We show that multi-task learning with additional normalization data can improve our model’s performance further.

pdf bib
Cross-lingual Transfer of Correlations between Parts of Speech and Gaze Features
Maria Barrett | Frank Keller | Anders Søgaard
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Several recent studies have shown that eye movements during reading provide information about grammatical and syntactic processing, which can assist the induction of NLP models. All these studies have been limited to English, however. This study shows that gaze and part of speech (PoS) correlations largely transfer across English and French. This means that we can replicate previous studies on gaze-based PoS tagging for French, but also that we can use English gaze data to assist the induction of French NLP models.

pdf bib
Multi-view and multi-task training of RST discourse parsers
Chloé Braud | Barbara Plank | Anders Søgaard
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We experiment with different ways of training LSTM networks to predict RST discourse trees. The main challenge for RST discourse parsing is the limited amounts of training data. We combat this by regularizing our models using task supervision from related tasks as well as alternative views on discourse structures. We show that a simple LSTM sequential discourse parser takes advantage of this multi-view and multi-task framework with 12-15% error reductions over our baseline (depending on the metric) and results that rival more complex state-of-the-art parsers.

2015

pdf bib
Any-language frame-semantic parsing
Anders Johannsen | Héctor Martínez Alonso | Anders Søgaard
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Non-canonical language is not harder to annotate than canonical language
Barbara Plank | Héctor Martínez Alonso | Anders Søgaard
Proceedings of The 9th Linguistic Annotation Workshop

pdf bib
Supersense tagging for Danish
Héctor Martínez Alonso | Anders Johannsen | Sussi Olsen | Sanni Nimb | Nicolai Hartvig Sørensen | Anna Braasch | Anders Søgaard | Bolette Sandford Pedersen
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Looking hard: Eye tracking for detecting grammaticality of automatically compressed sentences
Sigrid Klerke | Héctor Martínez Alonso | Anders Søgaard
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Active learning for sense annotation
Héctor Martínez Alonso | Barbara Plank | Anders Johannsen | Anders Søgaard
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Using reading behavior to predict grammatical functions
Maria Barrett | Anders Søgaard
Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning

pdf bib
Reading metrics for estimating task efficiency with MT output
Sigrid Klerke | Sheila Castilho | Maria Barrett | Anders Søgaard
Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning

pdf bib
Challenges of studying and processing dialects in social media
Anna Jørgensen | Dirk Hovy | Anders Søgaard
Proceedings of the Workshop on Noisy User-generated Text

pdf bib
Learning finite state word representations for unsupervised Twitter adaptation of POS taggers
Julie Wulff | Anders Søgaard
Proceedings of the Workshop on Noisy User-generated Text

pdf bib
Inverted indexing for cross-lingual NLP
Anders Søgaard | Željko Agić | Héctor Martínez Alonso | Barbara Plank | Bernd Bohnet | Anders Johannsen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages
Željko Agić | Dirk Hovy | Anders Søgaard
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Tagging Performance Correlates with Author Age
Dirk Hovy | Anders Søgaard
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Unsupervised extractive summarization via coverage maximization with syntactic and semantic concepts
Natalie Schluter | Anders Søgaard
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Cross-lingual syntactic variation over age and gender
Anders Johannsen | Dirk Hovy | Anders Søgaard
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Do dependency parsing metrics correlate with human judgments?
Barbara Plank | Héctor Martínez Alonso | Željko Agić | Danijela Merkler | Anders Søgaard
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Reading behavior predicts syntactic categories
Maria Barrett | Anders Søgaard
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Mining for unambiguous instances to adapt part-of-speech taggers to new domains
Dirk Hovy | Barbara Plank | Héctor Martínez Alonso | Anders Søgaard
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Learning to parse with IAA-weighted loss
Héctor Martínez Alonso | Barbara Plank | Arne Skjærholt | Anders Søgaard
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Simple task-specific bilingual word embeddings
Stephan Gouws | Anders Søgaard
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Learning part-of-speech taggers with inter-annotator agreement loss
Barbara Plank | Dirk Hovy | Anders Søgaard
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
What’s in a p-value in NLP?
Anders Søgaard | Anders Johannsen | Barbara Plank | Dirk Hovy | Hector Martínez Alonso
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf bib
Adapting taggers to Twitter with not-so-distant supervision
Barbara Plank | Dirk Hovy | Ryan McDonald | Anders Søgaard
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Selection Bias, Label Bias, and Bias in Ground Truth
Anders Søgaard | Barbara Plank | Dirk Hovy
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Tutorial Abstracts

pdf bib
More or less supervised supersense tagging of Twitter
Anders Johannsen | Dirk Hovy | Héctor Martínez Alonso | Barbara Plank | Anders Søgaard
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

pdf bib
Copenhagen-Malmö: Tree Approximations of Semantic Parsing Problems
Natalie Schluter | Anders Søgaard | Jakob Elming | Dirk Hovy | Barbara Plank | Héctor Martínez Alonso | Anders Johanssen | Sigrid Klerke
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Crowdsourcing and annotating NER for Twitter #drift
Hege Fromreide | Dirk Hovy | Anders Søgaard
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can be obtained from crowdsourced annotations, making it more feasible to “catch up” with language drift.

pdf bib
When POS data sets don’t add up: Combatting sample bias
Dirk Hovy | Barbara Plank | Anders Søgaard
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Several works in Natural Language Processing have recently looked into part-of-speech annotation of Twitter data and typically used their own data sets. Since conventions on Twitter change rapidly, models often show sample bias. Training on a combination of the existing data sets should help overcome this bias and produce more robust models than any trained on the individual corpora. Unfortunately, combining the existing corpora proves difficult: many of the corpora use proprietary tag sets that have little or no overlap. Even when mapped to a common tag set, the different corpora systematically differ in their treatment of various tags and tokens. This includes both pre-processing decisions, as well as default labels for frequent tokens, thus exhibiting data bias and label bias, respectively. Only if we address these biases can we combine the existing data sets to also overcome sample bias. We present a systematic study of several Twitter POS data sets, the problems of label and data bias, discuss their effects on model performance, and show how to overcome them to learn models that perform well on various test sets, achieving relative error reduction of up to 21%.

pdf bib
Importance weighting and unsupervised domain adaptation of POS taggers: a negative result
Barbara Plank | Anders Johannsen | Anders Søgaard
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Experiments with crowdsourced re-annotation of a POS tagging data set
Dirk Hovy | Barbara Plank | Anders Søgaard
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Linguistically debatable or just plain wrong?
Barbara Plank | Dirk Hovy | Anders Søgaard
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
With Blinkers on: Robust Prediction of Eye Movements across Readers
Franz Matthies | Anders Søgaard
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Using Crowdsourcing to get Representations based on Regular Expressions
Anders Søgaard | Hector Martinez | Jakob Elming | Anders Johannsen
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Cross-Domain Answer Ranking using Importance Sampling
Anders Johannsen | Anders Søgaard
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Disambiguating Explicit Discourse Connectives without Oracles
Anders Johannsen | Anders Søgaard
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Estimating effect size across datasets
Anders Søgaard
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Down-stream effects of tree-to-dependency conversions
Jakob Elming | Anders Johannsen | Sigrid Klerke | Emanuele Lapponi | Hector Martinez Alonso | Anders Søgaard
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Zipfian corruptions for robust POS tagging
Anders Søgaard
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
An Empirical Study of Differences between Conversion Schemes and Annotation Guidelines
Anders Søgaard
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

pdf bib
Invited Keynote: 6,909 Reasons to Mess Up Your Data
Anders Søgaard
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf bib
Part-of-speech tagging with antagonistic adversaries
Anders Søgaard
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Simple, readable sub-sentences
Sigrid Klerke | Anders Søgaard
51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop

2012

pdf bib
EMNLP@CPH: Is frequency all there is to simplicity?
Anders Johannsen | Héctor Martínez | Sigrid Klerke | Anders Søgaard
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Omri Abend | Chris Biemann | Anna Korhonen | Ari Rappoport | Roi Reichart | Anders Søgaard
Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP

pdf bib
Two baselines for unsupervised dependency parsing
Anders Søgaard
Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure

pdf bib
Mining wisdom
Anders Søgaard
Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature

pdf bib
DSim, a Danish Parallel Corpus for Text Simplification
Sigrid Klerke | Anders Søgaard
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present DSim, a new sentence aligned Danish monolingual parallel corpus extracted from 3701 pairs of news telegrams and corresponding professionally simplified short news articles. The corpus is intended for building automatic text simplification for adult readers. We compare DSim to different examples of monolingual parallel corpora, and we argue that this corpus is a promising basis for future development of automatic data-driven text simplification systems in Danish. The corpus contains both the collection of paired articles and a sentence aligned bitext, and we show that sentence alignment using simple tf*idf weighted cosine similarity scoring is on line with state―of―the―art when evaluated against a hand-aligned sample. The alignment results are compared to state of the art for English sentence alignment. We finally compare the source and simplified sides of the corpus in terms of lexical and syntactic characteristics and readability, and find that the one―to―many sentence aligned corpus is representative of the sentence simplifications observed in the unaligned collection of article pairs.

pdf bib
Robust Learning in Random Subspaces: Equipping NLP for OOV Effects
Anders Søgaard | Anders Johannsen
Proceedings of COLING 2012: Posters

pdf bib
An Empirical Etudy of Non-Lexical Extensions to Delexicalized Transfer
Anders Søgaard | Julie Wulff
Proceedings of COLING 2012: Posters

2011

pdf bib
From ranked words to dependency trees: two-stage unsupervised non-projective dependency parsing
Anders Søgaard
Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing

pdf bib
Shared Task System Description: Frustratingly Hard Compositionality Prediction
Anders Johannsen | Hector Martinez | Christian Rishøj | Anders Søgaard
Proceedings of the Workshop on Distributional Semantics and Compositionality

pdf bib
Factored Translation with Unsupervised Word Clusters
Christian Rishøj | Anders Søgaard
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Sentence-Level Instance-Weighting for Graph-Based and Transition-Based Dependency Parsing
Anders Søgaard | Martin Haulrich
Proceedings of the 12th International Conference on Parsing Technologies

pdf bib
Proceedings of Workshop on Robust Unsupervised and Semisupervised Methods in Natural Language Processing
Chris Biemann | Anders Søgaard
Proceedings of Workshop on Robust Unsupervised and Semisupervised Methods in Natural Language Processing

pdf bib
Using graphical models for PP attachment
Anders Søgaard
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

pdf bib
Semi-supervised condensed nearest neighbor for part-of-speech tagging
Anders Søgaard
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Data point selection for cross-language adaptation of dependency parsers
Anders Søgaard
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Can inversion transduction grammars generate hand alignments
Anders Søgaard
Proceedings of the 14th Annual conference of the European Association for Machine Translation

pdf bib
Simple Semi-Supervised Training of Part-Of-Speech Taggers
Anders Søgaard
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Semi-supervised dependency parsing using generalized tri-training
Anders Søgaard | Christian Rishøj
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib
Empirical Lower Bounds on Aligment Error Rates in Syntax-Based Machine Translation
Anders Søgaard | Jonas Kuhn
Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009

pdf bib
On the Complexity of Alignment Problems in Two Synchronous Grammar Formalisms
Anders Søgaard
Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009

pdf bib
Empirical lower bounds on translation unit error rate for the full class of inversion transduction grammars
Anders Søgaard | Dekai Wu
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

pdf bib
Using a maximum entropy-based tagger to improve a very fast vine parser
Anders Søgaard | Jonas Kuhn
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

pdf bib
A linear time extension of deterministic pushdown automata
Anders Søgaard
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

pdf bib
Verifying context-sensitive treebanks and heuristic parses in polynomial time
Anders Søgaard
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

2008

pdf bib
On the Weak Generative Capacity of Weighted Context-free Grammars
Anders Søgaard
Coling 2008: Companion volume: Posters

pdf bib
Range Concatenation Grammars for Translation
Anders Søgaard
Coling 2008: Companion volume: Posters

pdf bib
Learning context-sensitive synchronous rules
Anders Søgaard
Proceedings of the 12th Annual conference of the European Association for Machine Translation

2007

pdf bib
Polynomial Charts For Totally Unordered Languages
Anders Søgaard
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

2006

pdf bib
Embodied construction grammar as layered modal languages
Anders Sogaard
Proceedings of the Third Workshop on Scalable Natural Language Understanding

pdf bib
Functionality in grammar design
Anders Søgaard | Petter Haugereid
Proceedings of the 15th Nordic Conference of Computational Linguistics (NODALIDA 2005)

pdf bib
Logical investigations on the adequacy of certain feature-based theories of natural language
Anders Søgaard
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Doctoral Consortium

Search
Co-authors