Hinrich Schütze

Also published as: Hinrich Schuetze, Hinrich Schutze


2021

pdf bib
Proceedings of the First Workshop on Interactive Learning for Natural Language Processing
Kianté Brantley | Soham Dan | Iryna Gurevych | Ji-Ung Lee | Filip Radlinski | Hinrich Schütze | Edwin Simpson | Lili Yu
Proceedings of the First Workshop on Interactive Learning for Natural Language Processing

pdf bib
Identifying Automatically Generated Headlines using Transformers
Antonis Maronikolakis | Hinrich Schütze | Mark Stevenson
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

False information spread via the internet and social media influences public opinion and user activity, while generative models enable fake content to be generated faster and more cheaply than had previously been possible. In the not so distant future, identifying fake content generated by deep learning models will play a key role in protecting users from misinformation. To this end, a dataset containing human and computer-generated headlines was created and a user study indicated that humans were only able to identify the fake headlines in 47.8% of the cases. However, the most accurate automatic approach, transformers, achieved an overall accuracy of 85.7%, indicating that content generated from language models can be filtered out accurately.

pdf bib
Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference
Timo Schick | Hinrich Schütze
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language model with “task descriptions” in natural language (e.g., Radford et al., 2019). While this approach underperforms its supervised counterpart, we show in this work that the two ideas can be combined: We introduce Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task. These phrases are then used to assign soft labels to a large set of unlabeled examples. Finally, standard supervised training is performed on the resulting training set. For several tasks and languages, PET outperforms supervised training and strong semi-supervised approaches in low-resource settings by a large margin.

pdf bib
Does She Wink or Does She Nod? A Challenging Benchmark for Evaluating Word Understanding of Language Models
Lutfi Kerem Senel | Hinrich Schütze
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Recent progress in pretraining language models on large corpora has resulted in significant performance gains on many NLP tasks. These large models acquire linguistic knowledge during pretraining, which helps to improve performance on downstream tasks via fine-tuning. To assess what kind of knowledge is acquired, language models are commonly probed by querying them with ‘fill in the blank’ style cloze questions. Existing probing datasets mainly focus on knowledge about relations between words and entities. We introduce WDLMPro (Word Definitions Language Model Probing) to evaluate word understanding directly using dictionary definitions of words. In our experiments, three popular pretrained language models struggle to match words and their definitions. This indicates that they understand many words poorly and that our new probing task is a difficult challenge that could help guide research on LMs in the future.

pdf bib
Language Models for Lexical Inference in Context
Martin Schmitt | Hinrich Schütze
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Lexical inference in context (LIiC) is the task of recognizing textual entailment between two very similar sentences, i.e., sentences that only differ in one expression. It can therefore be seen as a variant of the natural language inference task that is focused on lexical semantics. We formulate and evaluate the first approaches based on pretrained language models (LMs) for this task: (i) a few-shot NLI classifier, (ii) a relation induction approach based on handcrafted patterns expressing the semantics of lexical inference, and (iii) a variant of (ii) with patterns that were automatically extracted from a corpus. All our approaches outperform the previous state of the art, showing the potential of pretrained LMs for LIiC. In an extensive analysis, we investigate factors of success and failure of our three approaches.

pdf bib
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models
Nora Kassner | Philipp Dufter | Hinrich Schütze
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Recently, it has been found that monolingual English language models can be used as knowledge bases. Instead of structural knowledge base queries, masked sentences such as “Paris is the capital of [MASK]” are used as probes. We translate the established benchmarks TREx and GoogleRE into 53 languages. Working with mBERT, we investigate three questions. (i) Can mBERT be used as a multilingual knowledge base? Most prior work only considers English. Extending research to multiple languages is important for diversity and accessibility. (ii) Is mBERT’s performance as knowledge base language-independent or does it vary from language to language? (iii) A multilingual model is trained on more text, e.g., mBERT is trained on 104 Wikipedias. Can mBERT leverage this for better performance? We find that using mBERT as a knowledge base yields varying performance across languages and pooling predictions across languages improves performance. Conversely, mBERT exhibits a language bias; e.g., when queried in Italian, it tends to predict Italy as the country of origin.

pdf bib
It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick | Hinrich Schütze
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts of compute are required for training and applying such big models, resulting in a large carbon footprint and making it difficult for researchers and practitioners to use them. We show that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain a task description, combined with gradient-based optimization; exploiting unlabeled data gives further improvements. We identify key factors required for successful natural language understanding with small language models.

pdf bib
Static Embeddings as Efficient Knowledge Bases?
Philipp Dufter | Nora Kassner | Hinrich Schütze
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent research investigates factual knowledge stored in large pretrained language models (PLMs). Instead of structural knowledge base (KB) queries, masked sentences such as “Paris is the capital of [MASK]” are used as probes. The good performance on this analysis task has been interpreted as PLMs becoming potential repositories of factual knowledge. In experiments across ten linguistically diverse languages, we study knowledge contained in static embeddings. We show that, when restricting the output space to a candidate set, simple nearest neighbor matching using static embeddings performs better than PLMs. E.g., static embeddings perform 1.6% points better than BERT while just using 0.3% of energy for training. One important factor in their good comparative performance is that static embeddings are standardly learned for a large vocabulary. In contrast, BERT exploits its more sophisticated, but expensive ability to compose meaningful representations from a much smaller subword vocabulary.

pdf bib
Multi-source Neural Topic Modeling in Multi-view Embedding Spaces
Pankaj Gupta | Yatin Chaudhary | Hinrich Schütze
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Though word embeddings and topics are complementary representations, several past works have only used pretrained word embeddings in (neural) topic modeling to address data sparsity in short-text or small collection of documents. This work presents a novel neural topic modeling framework using multi-view embed ding spaces: (1) pretrained topic-embeddings, and (2) pretrained word-embeddings (context-insensitive from Glove and context-sensitive from BERT models) jointly from one or many sources to improve topic quality and better deal with polysemy. In doing so, we first build respective pools of pretrained topic (i.e., TopicPool) and word embeddings (i.e., WordPool). We then identify one or more relevant source domain(s) and transfer knowledge to guide meaningful learning in the sparse target domain. Within neural topic modeling, we quantify the quality of topics and document representations via generalization (perplexity), interpretability (topic coherence) and information retrieval (IR) using short-text, long-text, small and large document collections from news and medical domains. Introducing the multi-source multi-view embedding spaces, we have shown state-of-the-art neural topic modeling using 6 source (high-resource) and 5 target (low-resource) corpora.

pdf bib
Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words
Valentin Hofmann | Janet Pierrehumbert | Hinrich Schütze
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

How does the input segmentation of pretrained language models (PLMs) affect their interpretations of complex words? We present the first study investigating this question, taking BERT as the example PLM and focusing on its semantic representations of English derivatives. We show that PLMs can be interpreted as serial dual-route models, i.e., the meanings of complex words are either stored or else need to be computed from the subwords, which implies that maximally meaningful input tokens should allow for the best generalization on new words. This hypothesis is confirmed by a series of semantic probing tasks on which DelBERT (Derivation leveraging BERT), a model with derivational input segmentation, substantially outperforms BERT with WordPiece segmentation. Our results suggest that the generalization capabilities of PLMs could be further improved if a morphologically-informed vocabulary of input tokens were used.

pdf bib
A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters
Mengjie Zhao | Yi Zhu | Ehsan Shareghi | Ivan Vulić | Roi Reichart | Anna Korhonen | Hinrich Schütze
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Few-shot crosslingual transfer has been shown to outperform its zero-shot counterpart with pretrained encoders like multilingual BERT. Despite its growing popularity, little to no attention has been paid to standardizing and analyzing the design of few-shot experiments. In this work, we highlight a fundamental risk posed by this shortcoming, illustrating that the model exhibits a high degree of sensitivity to the selection of few shots. We conduct a large-scale experimental study on 40 sets of sampled few shots for six diverse NLP tasks across up to 40 languages. We provide an analysis of success and failure cases of few-shot transfer, which highlights the role of lexical features. Additionally, we show that a straightforward full model finetuning approach is quite effective for few-shot transfer, outperforming several state-of-the-art few-shot approaches. As a step towards standardizing few-shot crosslingual experimental designs, we make our sampled few shots publicly available.

pdf bib
Dynamic Contextualized Word Embeddings
Valentin Hofmann | Janet Pierrehumbert | Hinrich Schütze
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Static word embeddings that represent words by a single vector cannot capture the variability of word meaning in different linguistic and extralinguistic contexts. Building on prior work on contextualized and dynamic word embeddings, we introduce dynamic contextualized word embeddings that represent words as a function of both linguistic and extralinguistic context. Based on a pretrained language model (PLM), dynamic contextualized word embeddings model time and social space jointly, which makes them attractive for a range of NLP tasks involving semantic variability. We highlight potential application scenarios by means of qualitative and quantitative analyses on four English datasets.

pdf bib
ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus
Ayyoob ImaniGooghari | Masoud Jalili Sabet | Philipp Dufter | Michael Cysou | Hinrich Schütze
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

With more than 7000 languages worldwide, multilingual natural language processing (NLP) is essential both from an academic and commercial perspective. Researching typological properties of languages is fundamental for progress in multilingual NLP. Examples include assessing language similarity for effective transfer learning, injecting inductive biases into machine learning models or creating resources such as dictionaries and inflection tables. We provide ParCourE, an online tool that allows to browse a word-aligned parallel corpus, covering 1334 languages. We give evidence that this is useful for typological research. ParCourE can be set up for any parallel corpus and can thus be used for typological research on other corpora as well as for exploring their quality and properties.

pdf bib
Multidomain Pretrained Language Models for Green NLP
Antonis Maronikolakis | Hinrich Schütze
Proceedings of the Second Workshop on Domain Adaptation for NLP

When tackling a task in a given domain, it has been shown that adapting a model to the domain using raw text data before training on the supervised task improves performance versus solely training on the task. The downside is that a lot of domain data is required and if we want to tackle tasks in n domains, we require n models each adapted on domain data before task learning. Storing and using these models separately can be prohibitive for low-end devices. In this paper we show that domain adaptation can be generalised to cover multiple domains. Specifically, a single model can be trained across various domains at the same time with minimal drop in performance, even when we use less data and resources. Thus, instead of training multiple models, we can train a single multidomain model saving on computational resources and training time.

pdf bib
Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data
Sanjeev Kumar Karn | Francine Chen | Yan-Ying Chen | Ulli Waltinger | Hinrich Schütze
Proceedings of the Second Workshop on Domain Adaptation for NLP

Interleaved texts, where posts belonging to different threads occur in a sequence, commonly occur in online chat posts, so that it can be time-consuming to quickly obtain an overview of the discussions. Existing systems first disentangle the posts by threads and then extract summaries from those threads. A major issue with such systems is error propagation from the disentanglement component. While end-to-end trainable summarization system could obviate explicit disentanglement, such systems require a large amount of labeled data. To address this, we propose to pretrain an end-to-end trainable hierarchical encoder-decoder system using synthetic interleaved texts. We show that by fine-tuning on a real-world meeting dataset (AMI), such a system out-performs a traditional two-step system by 22%. We also compare against transformer models and observed that pretraining with synthetic data both the encoder and decoder outperforms the BertSumExtAbs transformer model which pretrains only the encoder on a large dataset.

pdf bib
Modeling Graph Structure via Relative Position for Text Generation from Knowledge Graphs
Martin Schmitt | Leonardo F. R. Ribeiro | Philipp Dufter | Iryna Gurevych | Hinrich Schütze
Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15)

We present Graformer, a novel Transformer-based encoder-decoder architecture for graph-to-text generation. With our novel graph self-attention, the encoding of a node relies on all nodes in the input graph - not only direct neighbors - facilitating the detection of global patterns. We represent the relation between two nodes as the length of the shortest path between them. Graformer learns to weight these node-node relations differently for different attention heads, thus virtually learning differently connected views of the input graph. We evaluate Graformer on two popular graph-to-text generation benchmarks, AGENDA and WebNLG, where it achieves strong performance while using many fewer parameters than other approaches.

2020

pdf bib
Increasing Learning Efficiency of Self-Attention Networks through Direct Position Interactions, Learnable Temperature, and Convoluted Attention
Philipp Dufter | Martin Schmitt | Hinrich Schütze
Proceedings of the 28th International Conference on Computational Linguistics

Self-Attention Networks (SANs) are an integral part of successful neural architectures such as Transformer (Vaswani et al., 2017), and thus of pretrained language models such as BERT (Devlin et al., 2019) or GPT-3 (Brown et al., 2020). Training SANs on a task or pretraining them on language modeling requires large amounts of data and compute resources. We are searching for modifications to SANs that enable faster learning, i.e., higher accuracies after fewer update steps. We investigate three modifications to SANs: direct position interactions, learnable temperature, and convoluted attention. When evaluating them on part-of-speech tagging, we find that direct position interactions are an alternative to position embeddings, and convoluted attention has the potential to speed up the learning process.

pdf bib
Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations
Sheng Liang | Philipp Dufter | Hinrich Schütze
Proceedings of the 28th International Conference on Computational Linguistics

Pretrained language models (PLMs) learn stereotypes held by humans and reflected in text from their training corpora, including gender bias. When PLMs are used for downstream tasks such as picking candidates for a job, people’s lives can be negatively affected by these learned stereotypes. Prior work usually identifies a linear gender subspace and removes gender information by eliminating the subspace. Following this line of work, we propose to use DensRay, an analytical method for obtaining interpretable dense subspaces. We show that DensRay performs on-par with prior approaches, but provide arguments that it is more robust and provide indications that it preserves language model performance better. By applying DensRay to attention heads and layers of BERT we show that gender information is spread across all attention heads and most of the layers. Also we show that DensRay can obtain gender bias scores on both token and sentence levels. Finally, we demonstrate that we can remove bias multilingually, e.g., from Chinese, using only English training data.

pdf bib
Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification
Timo Schick | Helmut Schmid | Hinrich Schütze
Proceedings of the 28th International Conference on Computational Linguistics

A recent approach for few-shot text classification is to convert textual inputs to cloze questions that contain some form of task description, process them with a pretrained language model and map the predicted words to labels. Manually defining this mapping between words and labels requires both domain expertise and an understanding of the language model’s abilities. To mitigate this issue, we devise an approach that automatically finds such a mapping given small amounts of training data. For a number of tasks, the mapping found by our approach performs almost as well as hand-crafted label-to-word mappings.

pdf bib
Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction
Silvia Severini | Viktor Hangya | Alexander Fraser | Hinrich Schütze
Proceedings of the 28th International Conference on Computational Linguistics

Bilingual dictionary induction (BDI) is the task of accurately translating words to the target language. It is of great importance in many low-resource scenarios where cross-lingual training data is not available. To perform BDI, bilingual word embeddings (BWEs) are often used due to their low bilingual training signal requirements. They achieve high performance, but problematic cases still remain, such as the translation of rare words or named entities, which often need to be transliterated. In this paper, we enrich BWE-based BDI with transliteration information by using Bilingual Orthography Embeddings (BOEs). BOEs represent source and target language transliteration word pairs with similar vectors. A key problem in our BDI setup is to decide which information source – BWEs (or semantics) vs. BOEs (or orthography) – is more reliable for a particular word pair. We propose a novel classification-based BDI system that uses BWEs, BOEs and a number of other features to make this decision. We test our system on English-Russian BDI and show improved performance. In addition, we show the effectiveness of our BOEs by successfully using them for transliteration mining based on cosine similarity.

pdf bib
Embedding Space Correlation as a Measure of Domain Similarity
Anne Beyer | Göran Kauermann | Hinrich Schütze
Proceedings of the 12th Language Resources and Evaluation Conference

Prior work has determined domain similarity using text-based features of a corpus. However, when using pre-trained word embeddings, the underlying text corpus might not be accessible anymore. Therefore, we propose the CCA measure, a new measure of domain similarity based directly on the dimension-wise correlations between corresponding embedding spaces. Our results suggest that an inherent notion of domain can be captured this way, as we are able to reproduce our findings for different domain comparisons for English, German, Spanish and Czech as well as in cross-lingual comparisons. We further find a threshold at which the CCA measure indicates that two corpora come from the same domain in a monolingual setting by applying permutation tests. By evaluating the usability of the CCA measure in a domain adaptation application, we also show that it can be used to determine which corpora are more similar to each other in a cross-domain sentiment detection task.

pdf bib
ThaiLMCut: Unsupervised Pretraining for Thai Word Segmentation
Suteera Seeha | Ivan Bilan | Liliana Mamani Sanchez | Johannes Huber | Michael Matuschek | Hinrich Schütze
Proceedings of the 12th Language Resources and Evaluation Conference

We propose ThaiLMCut, a semi-supervised approach for Thai word segmentation which utilizes a bi-directional character language model (LM) as a way to leverage useful linguistic knowledge from unlabeled data. After the language model is trained on substantial unlabeled corpora, the weights of its embedding and recurrent layers are transferred to a supervised word segmentation model which continues fine-tuning them on a word segmentation task. Our experimental results demonstrate that applying the LM always leads to a performance gain, especially when the amount of labeled data is small. In such cases, the F1 Score increased by up to 2.02%. Even on abig labeled dataset, a small improvement gain can still be obtained. The approach has also shown to be very beneficial for out-of-domain settings with a gain in F1 Score of up to 3.13%. Finally, we show that ThaiLMCut can outperform other open source state-of-the-art models achieving an F1 Score of 98.78% on the standard benchmark, InterBEST2009.

pdf bib
Are Pretrained Language Models Symbolic Reasoners over Knowledge?
Nora Kassner | Benno Krojer | Hinrich Schütze
Proceedings of the 24th Conference on Computational Natural Language Learning

How can pretrained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that investigates the causal relation between facts present in training and facts learned by the PLM. For reasoning, we show that PLMs seem to learn to apply some symbolic reasoning rules correctly but struggle with others, including two-hop reasoning. Further analysis suggests that even the application of learned reasoning rules is flawed. For memorization, we identify schema conformity (facts systematically supported by other facts) and frequency as key factors for its success.

pdf bib
E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT
Nina Poerner | Ulli Waltinger | Hinrich Schütze
Findings of the Association for Computational Linguistics: EMNLP 2020

We present a novel way of injecting factual knowledge about entities into the pretrained BERT model (Devlin et al., 2019): We align Wikipedia2Vec entity vectors (Yamada et al., 2016) with BERT’s native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to ERNIE (Zhang et al., 2019) and KnowBert (Peters et al., 2019), but it requires no expensive further pre-training of the BERT encoder. We evaluate E-BERT on unsupervised question answering (QA), supervised relation classification (RC) and entity linking (EL). On all three tasks, E-BERT outperforms BERT and other baselines. We also show quantitatively that the original BERT model is overly reliant on the surface form of entity names (e.g., guessing that someone with an Italian-sounding name speaks Italian), and that E-BERT mitigates this problem.

pdf bib
Quantifying the Contextualization of Word Representations with Semantic Class Probing
Mengjie Zhao | Philipp Dufter | Yadollah Yaghoobzadeh | Hinrich Schütze
Findings of the Association for Computational Linguistics: EMNLP 2020

Pretrained language models achieve state-of-the-art results on many NLP tasks, but there are still many open questions about how and why they work so well. We investigate the contextualization of words in BERT. We quantify the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embedding. Quantifying contextualization helps in understanding and utilizing pretrained language models. We show that the top layer representations support highly accurate inference of semantic classes; that the strongest contextualization effects occur in the lower layers; that local context is mostly sufficient for contextualizing words; and that top layer representations are more task-specific after finetuning while lower layer representations are more transferable. Finetuning uncovers task-related features, but pretrained knowledge about contextualization is still well preserved.

pdf bib
Inexpensive Domain Adaptation of Pretrained Language Models: Case Studies on Biomedical NER and Covid-19 QA
Nina Poerner | Ulli Waltinger | Hinrich Schütze
Findings of the Association for Computational Linguistics: EMNLP 2020

Domain adaptation of Pretrained Language Models (PTLMs) is typically achieved by unsupervised pretraining on target-domain text. While successful, this approach is expensive in terms of hardware, runtime and CO 2 emissions. Here, we propose a cheaper alternative: We train Word2Vec on target-domain text and align the resulting word vectors with the wordpiece vectors of a general-domain PTLM. We evaluate on eight English biomedical Named Entity Recognition (NER) tasks and compare against the recently proposed BioBERT model. We cover over 60% of the BioBERT - BERT F1 delta, at 5% of BioBERT’s CO 2 footprint and 2% of its cloud compute cost. We also show how to quickly adapt an existing general-domain Question Answering (QA) model to an emerging domain: the Covid-19 pandemic.

pdf bib
SimAlign: High Quality Word Alignments Without Parallel Training Data Using Static and Contextualized Embeddings
Masoud Jalili Sabet | Philipp Dufter | François Yvon | Hinrich Schütze
Findings of the Association for Computational Linguistics: EMNLP 2020

Word alignments are useful for tasks like statistical and neural machine translation (NMT) and cross-lingual annotation projection. Statistical word aligners perform well, as do methods that extract alignments jointly with translations in NMT. However, most approaches require parallel training data and quality decreases as less training data is available. We propose word alignment methods that require no parallel data. The key idea is to leverage multilingual word embeddings – both static and contextualized – for word alignment. Our multilingual embeddings are created from monolingual data only without relying on any parallel data or dictionaries. We find that alignments created from embeddings are superior for four and comparable for two language pairs compared to those produced by traditional statistical aligners – even with abundant parallel data; e.g., contextualized embeddings achieve a word alignment F1 for English-German that is 5 percentage points higher than eflomal, a high-quality statistical aligner, trained on 100k parallel sentences.

pdf bib
TopicBERT for Energy Efficient Document Classification
Yatin Chaudhary | Pankaj Gupta | Khushbu Saxena | Vivek Kulkarni | Thomas Runkler | Hinrich Schütze
Findings of the Association for Computational Linguistics: EMNLP 2020

Prior research notes that BERT’s computational cost grows quadratically with sequence length thus leading to longer training times, higher GPU memory constraints and carbon emissions. While recent work seeks to address these scalability issues at pre-training, these issues are also prominent in fine-tuning especially for long sequence tasks like document classification. Our work thus focuses on optimizing the computational cost of fine-tuning for document classification. We achieve this by complementary learning of both topic and language models in a unified framework, named TopicBERT. This significantly reduces the number of self-attention operations – a main performance bottleneck. Consequently, our model achieves a 1.4x ( 40%) speedup with 40% reduction in CO2 emission while retaining 99.9% performance over 5 datasets.

pdf bib
BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA
Nora Kassner | Hinrich Schütze
Findings of the Association for Computational Linguistics: EMNLP 2020

Khandelwal et al. (2020) use a k-nearest-neighbor (kNN) component to improve language model performance. We show that this idea is beneficial for open-domain question answering (QA). To improve the recall of facts encountered during training, we combine BERT (Devlin et al., 2019) with a traditional information retrieval step (IR) and a kNN search over a large datastore of an embedded text collection. Our contributions are as follows: i) BERT-kNN outperforms BERT on cloze-style QA by large margins without any further training. ii) We show that BERT often identifies the correct response category (e.g., US city), but only kNN recovers the factually correct answer (e.g.,“Miami”). iii) Compared to BERT, BERT-kNN excels for rare facts. iv) BERT-kNN can easily handle facts not covered by BERT’s training set, e.g., recent events.

pdf bib
LMU Bilingual Dictionary Induction System with Word Surface Similarity Scores for BUCC 2020
Silvia Severini | Viktor Hangya | Alexander Fraser | Hinrich Schütze
Proceedings of the 13th Workshop on Building and Using Comparable Corpora

The task of Bilingual Dictionary Induction (BDI) consists of generating translations for source language words which is important in the framework of machine translation (MT). The aim of the BUCC 2020 shared task is to perform BDI on various language pairs using comparable corpora. In this paper, we present our approach to the task of English-German and English-Russian language pairs. Our system relies on Bilingual Word Embeddings (BWEs) which are often used for BDI when only a small seed lexicon is available making them particularly effective in a low-resource setting. On the other hand, they perform well on high frequency words only. In order to improve the performance on rare words as well, we combine BWE based word similarity with word surface similarity methods, such as orthography In addition to the often used top-n translation method, we experiment with a margin based approach aiming for dynamic number of translations for each source word. We participate in both the open and closed tracks of the shared task and we show improved results of our method compared to simple vector similarity based approaches. Our system was ranked in the top-3 teams and achieved the best results for English-Russian.

pdf bib
A Graph Auto-encoder Model of Derivational Morphology
Valentin Hofmann | Hinrich Schütze | Janet Pierrehumbert
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

There has been little work on modeling the morphological well-formedness (MWF) of derivatives, a problem judged to be complex and difficult in linguistics. We present a graph auto-encoder that learns embeddings capturing information about the compatibility of affixes and stems in derivation. The auto-encoder models MWF in English surprisingly well by combining syntactic and semantic information with associative information from the mental lexicon.

pdf bib
BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance
Timo Schick | Hinrich Schütze
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Pretraining deep language models has led to large performance gains in NLP. Despite this success, Schick and Schütze (2020) recently showed that these models struggle to understand rare words. For static word embeddings, this problem has been addressed by separately learning representations for rare words. In this work, we transfer this idea to pretrained language models: We introduce BERTRAM, a powerful architecture based on BERT that is capable of inferring high-quality embeddings for rare words that are suitable as input representations for deep language models. This is achieved by enabling the surface form and contexts of a word to interact with each other in a deep architecture. Integrating BERTRAM into BERT leads to large performance increases due to improved representations of rare and medium frequency words on both a rare word probing task and three downstream tasks.

pdf bib
Sentence Meta-Embeddings for Unsupervised Semantic Textual Similarity
Nina Poerner | Ulli Waltinger | Hinrich Schütze
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We address the task of unsupervised Semantic Textual Similarity (STS) by ensembling diverse pre-trained sentence encoders into sentence meta-embeddings. We apply, extend and evaluate different meta-embedding methods from the word embedding literature at the sentence level, including dimensionality reduction (Yin and Schütze, 2016), generalized Canonical Correlation Analysis (Rastogi et al., 2015) and cross-view auto-encoders (Bollegala and Bao, 2018). Our sentence meta-embeddings set a new unsupervised State of The Art (SoTA) on the STS Benchmark and on the STS12-STS16 datasets, with gains of between 3.7% and 6.4% Pearson’s r over single-source systems.

pdf bib
Predicting the Growth of Morphological Families from Social and Linguistic Factors
Valentin Hofmann | Janet Pierrehumbert | Hinrich Schütze
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present the first study that examines the evolution of morphological families, i.e., sets of morphologically related words such as “trump”, “antitrumpism”, and “detrumpify”, in social media. We introduce the novel task of Morphological Family Expansion Prediction (MFEP) as predicting the increase in the size of a morphological family. We create a ten-year Reddit corpus as a benchmark for MFEP and evaluate a number of baselines on this benchmark. Our experiments demonstrate very good performance on MFEP.

pdf bib
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly
Nora Kassner | Hinrich Schütze
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Building on Petroni et al. 2019, we propose two new probing tasks analyzing factual knowledge stored in Pretrained Language Models (PLMs). (1) Negation. We find that PLMs do not distinguish between negated (‘‘Birds cannot [MASK]”) and non-negated (‘‘Birds can [MASK]”) cloze questions. (2) Mispriming. Inspired by priming methods in human psychology, we add “misprimes” to cloze questions (‘‘Talk? Birds can [MASK]”). We find that PLMs are easily distracted by misprimes. These results suggest that PLMs still have a long way to go to adequately learn human-like factual knowledge.

pdf bib
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
Mengjie Zhao | Tao Lin | Fei Mi | Martin Jaggi | Hinrich Schütze
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT, RoBERTa, and DistilBERT on eleven diverse NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred. Intrinsic evaluations show that representations computed by our binary masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.

pdf bib
DagoBERT: Generating Derivational Morphology with a Pretrained Language Model
Valentin Hofmann | Janet Pierrehumbert | Hinrich Schütze
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Can pretrained language models (PLMs) generate derivationally complex words? We present the first study investigating this question, taking BERT as the example PLM. We examine BERT’s derivational capabilities in different settings, ranging from using the unmodified pretrained model to full finetuning. Our best model, DagoBERT (Derivationally and generatively optimized BERT), clearly outperforms the previous state of the art in derivation generation (DG). Furthermore, our experiments show that the input segmentation crucially impacts BERT’s derivational knowledge, suggesting that the performance of PLMs could be further improved if a morphologically informed vocabulary of units were used.

pdf bib
Identifying Elements Essential for BERT’s Multilinguality
Philipp Dufter | Hinrich Schütze
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

It has been shown that multilingual BERT (mBERT) yields high quality multilingual representations and enables effective zero-shot transfer. This is surprising given that mBERT does not use any crosslingual signal during training. While recent literature has studied this phenomenon, the reasons for the multilinguality are still somewhat obscure. We aim to identify architectural properties of BERT and linguistic properties of languages that are necessary for BERT to become multilingual. To allow for fast experimentation we propose an efficient setup with small BERT models trained on a mix of synthetic and natural data. Overall, we identify four architectural and two linguistic elements that influence multilinguality. Based on our insights, we experiment with a multilingual pretraining setup that modifies the masking strategy using VecMap, i.e., unsupervised embedding alignment. Experiments on XNLI with three languages indicate that our findings transfer from our small setup to larger scale settings.

pdf bib
An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing
Martin Schmitt | Sahand Sharifzadeh | Volker Tresp | Hinrich Schütze
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Knowledge graphs (KGs) can vary greatly from one domain to another. Therefore supervised approaches to both graph-to-text generation and text-to-graph knowledge extraction (semantic parsing) will always suffer from a shortage of domain-specific parallel graph-text data; at the same time, adapting a model trained on a different domain is often impossible due to little or no overlap in entities and relations. This situation calls for an approach that (1) does not need large amounts of annotated data and thus (2) does not need to rely on domain adaptation techniques to work well on different domains. To this end, we present the first approach to unsupervised text generation from KGs and show simultaneously how it can be used for unsupervised semantic parsing. We evaluate our approach on WebNLG v2.1 and a new benchmark leveraging scene graphs from Visual Genome. Our system outperforms strong baselines for both text<->graph conversion tasks without any manual adaptation from one dataset to the other. In additional experiments, we investigate the impact of using different unsupervised objectives.

pdf bib
EmbLexChange at SemEval-2020 Task 1: Unsupervised Embedding-based Detection of Lexical Semantic Changes
Ehsaneddin Asgari | Christoph Ringlstetter | Hinrich Schütze
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes EmbLexChange, a system introduced by the “Life-Language” team for SemEval-2020 Task 1, on unsupervised detection of lexical-semantic changes. EmbLexChange is defined as the divergence between the embedding based profiles of word w (calculated with respect to a set of reference words) in the source and the target domains (source and target domains can be simply two time frames t_1 and t_2). The underlying assumption is that the lexical-semantic change of word w would affect its co-occurring words and subsequently alters the neighborhoods in the embedding spaces. We show that using a resampling framework for the selection of reference words (with conserved senses), we can more reliably detect lexical-semantic changes in English, German, Swedish, and Latin. EmbLexChange achieved second place in the binary detection of semantic changes in the SemEval-2020.

2019

pdf bib
Analytical Methods for Interpretable Ultradense Word Embeddings
Philipp Dufter | Hinrich Schütze
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Word embeddings are useful for a wide variety of tasks, but they lack interpretability. By rotating word spaces, interpretable dimensions can be identified while preserving the information contained in the embeddings without any loss. In this work, we investigate three methods for making word spaces interpretable by rotation: Densifier (Rothe et al., 2016), linear SVMs and DensRay, a new method we propose. In contrast to Densifier, DensRay can be computed in closed form, is hyperparameter-free and thus more robust than Densifier. We evaluate the three methods on lexicon induction and set-based word analogy. In addition we provide qualitative insights as to how interpretable word spaces can be used for removing gender bias from embeddings.

pdf bib
Multi-View Domain Adapted Sentence Embeddings for Low-Resource Unsupervised Duplicate Question Detection
Nina Poerner | Hinrich Schütze
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We address the problem of Duplicate Question Detection (DQD) in low-resource domain-specific Community Question Answering forums. Our multi-view framework MV-DASE combines an ensemble of sentence encoders via Generalized Canonical Correlation Analysis, using unlabeled data only. In our experiments, the ensemble includes generic and domain-specific averaged word embeddings, domain-finetuned BERT and the Universal Sentence Encoder. We evaluate MV-DASE on the CQADupStack corpus and on additional low-resource Stack Exchange forums. Combining the strengths of different encoders, we significantly outperform BM25, all single-view systems as well as a recent supervised domain-adversarial DQD method.

pdf bib
Neural Architectures for Fine-Grained Propaganda Detection in News
Pankaj Gupta | Khushbu Saxena | Usama Yaseen | Thomas Runkler | Hinrich Schütze
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

This paper describes our system (MIC-CIS) details and results of participation in the fine grained propaganda detection shared task 2019. To address the tasks of sentence (SLC) and fragment level (FLC) propaganda detection, we explore different neural architectures (e.g., CNN, LSTM-CRF and BERT) and extract linguistic (e.g., part-of-speech, named entity, readability, sentiment, emotion, etc.), layout and topical features. Specifically, we have designed multi-granularity and multi-tasking neural architectures to jointly perform both the sentence and fragment level propaganda detection. Additionally, we investigate different ensemble schemes such as majority-voting, relax-voting, etc. to boost overall system performance. Compared to the other participating systems, our submissions are ranked 3rd and 4th in FLC and SLC tasks, respectively.

pdf bib
Linguistically Informed Relation Extraction and Neural Architectures for Nested Named Entity Recognition in BioNLP-OST 2019
Pankaj Gupta | Usama Yaseen | Hinrich Schütze
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

Named Entity Recognition (NER) and Relation Extraction (RE) are essential tools in distilling knowledge from biomedical literature. This paper presents our findings from participating in BioNLP Shared Tasks 2019. We addressed Named Entity Recognition including nested entities extraction, Entity Normalization and Relation Extraction. Our proposed approach of Named Entities can be generalized to different languages and we have shown it’s effectiveness for English and Spanish text. We investigated linguistic features, hybrid loss including ranking and Conditional Random Fields (CRF), multi-task objective and token level ensembling strategy to improve NER. We employed dictionary based fuzzy and semantic search to perform Entity Normalization. Finally, our RE system employed Support Vector Machine (SVM) with linguistic features. Our NER submission (team:MIC-CIS) ranked first in BB-2019 norm+NER task with standard error rate (SER) of 0.7159 and showed competitive performance on PharmaCo NER task with F1-score of 0.8662. Our RE system ranked first in the SeeDev-binary Relation Extraction Task with F1-score of 0.3738.

pdf bib
BioNLP-OST 2019 RDoC Tasks: Multi-grain Neural Relevance Ranking Using Topics and Attention Based Query-Document-Sentence Interactions
Pankaj Gupta | Yatin Chaudhary | Hinrich Schütze
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

This paper presents our system details and results of participation in the RDoC Tasks of BioNLP-OST 2019. Research Domain Criteria (RDoC) construct is a multi-dimensional and broad framework to describe mental health disorders by combining knowledge from genomics to behaviour. Non-availability of RDoC labelled dataset and tedious labelling process hinders the use of RDoC framework to reach its full potential in Biomedical research community and Healthcare industry. Therefore, Task-1 aims at retrieval and ranking of PubMed abstracts relevant to a given RDoC construct and Task-2 aims at extraction of the most relevant sentence from a given PubMed abstract. We investigate (1) attention based supervised neural topic model and SVM for retrieval and ranking of PubMed abstracts and, further utilize BM25 and other relevance measures for re-ranking, (2) supervised and unsupervised sentence ranking models utilizing multi-view representations comprising of query-aware attention-based sentence representation (QAR), bag-of-words (BoW) and TF-IDF. Our best systems achieved 1st rank and scored 0.86 mAP and 0.58 macro average accuracy in Task-1 and Task-2 respectively.

pdf bib
Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts
Timo Schick | Hinrich Schütze
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Learning high-quality embeddings for rare words is a hard problem because of sparse context information. Mimicking (Pinter et al., 2017) has been proposed as a solution: given embeddings learned by a standard algorithm, a model is first trained to reproduce embeddings of frequent words from their surface form and then used to compute embeddings for rare words. In this paper, we introduce attentive mimicking: the mimicking model is given access not only to a word’s surface form, but also to all available contexts and learns to attend to the most informative and reliable contexts for computing an embedding. In an evaluation on four tasks, we show that attentive mimicking outperforms previous work for both rare and medium-frequency words. Thus, compared to previous work, attentive mimicking improves embeddings for a much larger part of the vocabulary, including the medium-frequency range.

pdf bib
Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging
Apostolos Kemos | Heike Adel | Hinrich Schütze
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Character-level models of tokens have been shown to be effective at dealing with within-token noise and out-of-vocabulary words. However, they often still rely on correct token boundaries. In this paper, we propose to eliminate the need for tokenizers with an end-to-end character-level semi-Markov conditional random field. It uses neural networks for its character and segment representations. We demonstrate its effectiveness in multilingual settings and when token boundaries are noisy: It matches state-of-the-art part-of-speech taggers for various languages and significantly outperforms them on a noisy English version of a benchmark dataset. Our code and the noisy dataset are publicly available at http://cistern.cis.lmu.de/semiCRF.

pdf bib
News Article Teaser Tweets and How to Generate Them
Sanjeev Kumar Karn | Mark Buckley | Ulli Waltinger | Hinrich Schütze
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this work, we define the task of teaser generation and provide an evaluation benchmark and baseline systems for the process of generating teasers. A teaser is a short reading suggestion for an article that is illustrative and includes curiosity-arousing elements to entice potential readers to read particular news items. Teasers are one of the main vehicles for transmitting news to social media users. We compile a novel dataset of teasers by systematically accumulating tweets and selecting those that conform to the teaser definition. We have compared a number of neural abstractive architectures on the task of teaser generation and the overall best performing system is See et al. seq2seq with pointer network.

pdf bib
Towards Summarization for Social Media - Results of the TL;DR Challenge
Shahbaz Syed | Michael Völske | Nedim Lipka | Benno Stein | Hinrich Schütze | Martin Potthast
Proceedings of the 12th International Conference on Natural Language Generation

In this paper, we report on the results of the TL;DR challenge, discussing an extensive manual evaluation of the expected properties of a good summary based on analyzing the comments provided by human annotators.

pdf bib
Automatic Domain Adaptation Outperforms Manual Domain Adaptation for Predicting Financial Outcomes
Marina Sedinkina | Nikolas Breitkopf | Hinrich Schütze
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we automatically create sentiment dictionaries for predicting financial outcomes. We compare three approaches: (i) manual adaptation of the domain-general dictionary H4N, (ii) automatic adaptation of H4N and (iii) a combination consisting of first manual, then automatic adaptation. In our experiments, we demonstrate that the automatically adapted sentiment dictionary outperforms the previous state of the art in predicting the financial outcomes excess return and volatility. In particular, automatic adaptation performs better than manual adaptation. In our analysis, we find that annotation based on an expert’s a priori belief about a word’s meaning can be incorrect – annotation should be performed based on the word’s contexts in the target domain instead.

pdf bib
SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference
Martin Schmitt | Hinrich Schütze
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We present SherLIiC, a testbed for lexical inference in context (LIiC), consisting of 3985 manually annotated inference rule candidates (InfCands), accompanied by (i) ~960k unlabeled InfCands, and (ii) ~190k typed textual relations between Freebase entities extracted from the large entity-linked corpus ClueWeb09. Each InfCand consists of one of these relations, expressed as a lemmatized dependency path, and two argument placeholders, each linked to one or more Freebase types. Due to our candidate selection process based on strong distributional evidence, SherLIiC is much harder than existing testbeds because distributional evidence is of little utility in the classification of InfCands. We also show that, due to its construction, many of SherLIiC’s correct InfCands are novel and missing from existing rule bases. We evaluate a large number of strong baselines on SherLIiC, ranging from semantic vector space models to state of the art neural models of natural language inference (NLI). We show that SherLIiC poses a tough challenge to existing NLI systems.

pdf bib
A Multilingual BPE Embedding Space for Universal Sentiment Lexicon Induction
Mengjie Zhao | Hinrich Schütze
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We present a new method for sentiment lexicon induction that is designed to be applicable to the entire range of typological diversity of the world’s languages. We evaluate our method on Parallel Bible Corpus+ (PBC+), a parallel corpus of 1593 languages. The key idea is to use Byte Pair Encodings (BPEs) as basic units for multilingual embeddings. Through zero-shot transfer from English sentiment, we learn a seed lexicon for each language in the domain of PBC+. Through domain adaptation, we then generalize the domain-specific lexicon to a general one. We show – across typologically diverse languages in PBC+ – good quality of seed and general-domain sentiment lexicons by intrinsic and extrinsic and by automatic and human evaluation. We make freely available our code, seed sentiment lexicons for all 1593 languages and induced general-domain sentiment lexicons for 200 languages.

pdf bib
Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings
Yadollah Yaghoobzadeh | Katharina Kann | T. J. Hazen | Eneko Agirre | Hinrich Schütze
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Word embeddings typically represent different meanings of a word in a single conflated vector. Empirical analysis of embeddings of ambiguous words is currently limited by the small size of manually annotated resources and by the fact that word senses are treated as unrelated individual concepts. We present a large dataset based on manual Wikipedia annotations and word senses, where word senses from different words are related by semantic classes. This is the basis for novel diagnostic tests for an embedding’s content: we probe word embeddings for semantic classes and analyze the embedding space by classifying embeddings into semantic classes. Our main findings are: (i) Information about a sense is generally represented well in a single-vector embedding – if the sense is frequent. (ii) A classifier can accurately predict whether a word is single-sense or multi-sense, based only on its embedding. (iii) Although rare senses are not well represented in single-vector embeddings, this does not have negative impact on an NLP application whose performance depends on frequent senses.

2018

pdf bib
Recurrent One-Hop Predictions for Reasoning over Knowledge Graphs
Wenpeng Yin | Yadollah Yaghoobzadeh | Hinrich Schütze
Proceedings of the 27th International Conference on Computational Linguistics

Large scale knowledge graphs (KGs) such as Freebase are generally incomplete. Reasoning over multi-hop (mh) KG paths is thus an important capability that is needed for question answering or other NLP tasks that require knowledge about the world. mh-KG reasoning includes diverse scenarios, e.g., given a head entity and a relation path, predict the tail entity; or given two entities connected by some relation paths, predict the unknown relation between them. We present ROPs, recurrent one-hop predictors, that predict entities at each step of mh-KB paths by using recurrent neural networks and vector representations of entities and relations, with two benefits: (i) modeling mh-paths of arbitrary lengths while updating the entity and relation representations by the training signal at each step; (ii) handling different types of mh-KG reasoning in a unified framework. Our models show state-of-the-art for two important multi-hop KG reasoning tasks: Knowledge Base Completion and Path Query Answering.

pdf bib
Joint Bootstrapping Machines for High Confidence Relation Extraction
Pankaj Gupta | Benjamin Roth | Hinrich Schütze
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Semi-supervised bootstrapping techniques for relationship extraction from text iteratively expand a set of initial seed instances. Due to the lack of labeled data, a key challenge in bootstrapping is semantic drift: if a false positive instance is added during an iteration, then all following iterations are contaminated. We introduce BREX, a new bootstrapping method that protects against such contamination by highly effective confidence assessment. This is achieved by using entity and template seeds jointly (as opposed to just one as in previous work), by expanding entities and templates in parallel and in a mutually constraining fashion in each iteration and by introducing higherquality similarity measures for templates. Experimental results show that BREX achieves an F1 that is 0.13 (0.87 vs. 0.74) better than the state of the art for four relationships.

pdf bib
Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages
Katharina Kann | Jesus Manuel Mager Hois | Ivan Vladimir Meza-Ruiz | Hinrich Schütze
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we first show that they also obtain competitive performance for Mexican polysynthetic languages in minimal-resource settings. We then propose two novel multi-task training approaches—one with, one without need for external unlabeled resources—, and two corresponding data augmentation methods, improving over the neural baseline for all languages. Finally, we explore cross-lingual transfer as a third way to fortify our neural model and show that we can train one single multi-lingual model for related languages while maintaining comparable or even improved performance, thus reducing the amount of parameters by close to 75%. We provide our morphological segmentation datasets for Mexicanero, Nahuatl, Wixarika and Yorem Nokki for future research.

pdf bib
Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time
Pankaj Gupta | Subburam Rajaram | Hinrich Schütze | Bernt Andrassy
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Dynamic topic modeling facilitates the identification of topical trends over time in temporal collections of unstructured documents. We introduce a novel unsupervised neural dynamic topic model named as Recurrent Neural Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each time influence the topic discovery in the subsequent time steps. We account for the temporal ordering of documents by explicitly modeling a joint distribution of latent topical dependencies over time, using distributional estimators with temporal recurrent connections. Applying RNN-RSM to 19 years of articles on NLP research, we demonstrate that compared to state-of-the art topic models, RNNRSM shows better generalization, topic interpretation, evolution and trends. We also introduce a metric (named as SPAN) to quantify the capability of dynamic topic model to capture word evolution in topics over time.

pdf bib
Evaluating neural network explanation methods using hybrid documents and morphosyntactic agreement
Nina Poerner | Hinrich Schütze | Benjamin Roth
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The behavior of deep neural networks (DNNs) is hard to understand. This makes it necessary to explore post hoc explanation methods. We conduct the first comprehensive evaluation of explanation methods for NLP. To this end, we design two novel evaluation paradigms that cover two important classes of NLP problems: small context and large context problems. Both paradigms require no manual annotation and are therefore broadly applicable. We also introduce LIMSSE, an explanation method inspired by LIME that is designed for NLP. We show empirically that LIMSSE, LRP and DeepLIFT are the most effective explanation methods and recommend them for explaining DNNs in NLP.

pdf bib
Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable
Viktor Hangya | Fabienne Braune | Alexander Fraser | Hinrich Schütze
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Bilingual tasks, such as bilingual lexicon induction and cross-lingual classification, are crucial for overcoming data sparsity in the target language. Resources required for such tasks are often out-of-domain, thus domain adaptation is an important problem here. We make two contributions. First, we test a delightfully simple method for domain adaptation of bilingual word embeddings. We evaluate these embeddings on two bilingual tasks involving different domains: cross-lingual twitter sentiment classification and medical bilingual lexicon induction. Second, we tailor a broadly applicable semi-supervised classification method from computer vision to these tasks. We show that this method also helps in low-resource setups. Using both methods together we achieve large improvements over our baselines, by using only additional unlabeled data.

pdf bib
Embedding Learning Through Multilingual Concept Induction
Philipp Dufter | Mengjie Zhao | Martin Schmitt | Alexander Fraser | Hinrich Schütze
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a new method for estimating vector space representations of words: embedding learning by concept induction. We test this method on a highly parallel corpus and learn semantic representations of words in 1259 different languages in a single common space. An extensive experimental evaluation on crosslingual word similarity and sentiment analysis indicates that concept-based multilingual embedding learning performs better than previous approaches.

pdf bib
End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions
Wenpeng Yin | Dan Roth | Hinrich Schütze
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This work deals with SciTail, a natural entailment challenge derived from a multi-choice question answering problem. The premises and hypotheses in SciTail were generated with no awareness of each other, and did not specifically aim at the entailment task. This makes it more challenging than other entailment data sets and more directly useful to the end-task – question answering. We propose DEISTE (deep explorations of inter-sentence interactions for textual entailment) for this entailment task. Given word-to-word interactions between the premise-hypothesis pair (P, H), DEISTE consists of: (i) a parameter-dynamic convolution to make important words in P and H play a dominant role in learnt representations; and (ii) a position-aware attentive convolution to encode the representation and position information of the aligned word pairs. Experiments show that DEISTE gets ≈5% improvement over prior state of the art and that the pretrained DEISTE on SciTail generalizes well on RTE-5.

pdf bib
Joint Semantic Synthesis and Morphological Analysis of the Derived Word
Ryan Cotterell | Hinrich Schütze
Transactions of the Association for Computational Linguistics, Volume 6

Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+able+ly. However, this structural decomposition of the word does not directly give us a semantic representation of the word’s meaning. Since morphology obeys the principle of compositionality, the semantics of the word can be systematically derived from the meaning of its parts. In this work, we propose a novel probabilistic model of word formation that captures both the analysis of a word w into its constituent segments and the synthesis of the meaning of w from the meanings of those segments. Our model jointly learns to segment words into morphemes and compose distributional semantic vectors of those morphemes. We experiment with the model on English CELEX data and German DErivBase (Zeller et al., 2013) data. We show that jointly modeling semantics increases both segmentation accuracy and morpheme F1 by between 3% and 5%. Additionally, we investigate different models of vector composition, showing that recurrent neural networks yield an improvement over simple additive models. Finally, we study the degree to which the representations correspond to a linguist’s notion of morphological productivity.

pdf bib
Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms
Wenpeng Yin | Hinrich Schütze
Transactions of the Association for Computational Linguistics, Volume 6

In NLP, convolutional neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because the attention in CNNs has been mainly implemented as attentive pooling (i.e., it is applied to pooling) rather than as attentive convolution (i.e., it is integrated into convolution). Convolution is the differentiator of CNNs in that it can powerfully model the higher-level representation of a word by taking into account its local fixed-size context in the input text tx. In this work, we propose an attentive convolution network, ATTCONV. It extends the context scope of the convolution operation, deriving higher-level features for a word not only from local context, but also from information extracted from nonlocal context by the attention mechanism commonly used in RNNs. This nonlocal context can come (i) from parts of the input text tx that are distant or (ii) from extra (i.e., external) contexts ty. Experiments on sentence modeling with zero-context (sentiment analysis), single-context (textual entailment) and multiple-context (claim verification) demonstrate the effectiveness of ATTCONV in sentence representation learning with the incorporation of context. In particular, attentive convolution outperforms attentive pooling and is a strong competitor to popular attentive RNNs.1

pdf bib
Proceedings of the Second Workshop on Subword/Character LEvel Models
Manaal Faruqui | Hinrich Schütze | Isabel Trancoso | Yulia Tsvetkov | Yadollah Yaghoobzadeh
Proceedings of the Second Workshop on Subword/Character LEvel Models

pdf bib
Evaluating Word Embeddings in Multi-label Classification Using Fine-Grained Name Typing
Yadollah Yaghoobzadeh | Katharina Kann | Hinrich Schütze
Proceedings of The Third Workshop on Representation Learning for NLP

Embedding models typically associate each word with a single real-valued vector, representing its different properties. Evaluation methods, therefore, need to analyze the accuracy and completeness of these properties in embeddings. This requires fine-grained analysis of embedding subspaces. Multi-label classification is an appropriate way to do so. We propose a new evaluation method for word embeddings based on multi-label classification given a word embedding. The task we use is fine-grained name typing: given a large corpus, find all types that a name can refer to based on the name embedding. Given the scale of entities in knowledge bases, we can build datasets for this task that are complementary to the current embedding evaluation datasets in: they are very large, contain fine-grained classes, and allow the direct evaluation of embeddings without confounding factors like sentence context.

pdf bib
Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retrieval in Asymmetric Texts
Pankaj Gupta | Bernt Andrassy | Hinrich Schütze
Proceedings of the Third Workshop on Semantic Deep Learning

The goal of our industrial ticketing system is to retrieve a relevant solution for an input query, by matching with historical tickets stored in knowledge base. A query is comprised of subject and description, while a historical ticket consists of subject, description and solution. To retrieve a relevant solution, we use textual similarity paradigm to learn similarity in the query and historical tickets. The task is challenging due to significant term mismatch in the query and ticket pairs of asymmetric lengths, where subject is a short text but description and solution are multi-sentence texts. We present a novel Replicated Siamese LSTM model to learn similarity in asymmetric text pairs, that gives 22% and 7% gain (Accuracy@10) for retrieval task, respectively over unsupervised and supervised baselines. We also show that the topic and distributed semantic features for short and long texts improved both similarity learning and retrieval.

pdf bib
LISA: Explaining Recurrent Neural Network Judgments via Layer-wIse Semantic Accumulation and Example to Pattern Transformation
Pankaj Gupta | Hinrich Schütze
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Recurrent neural networks (RNNs) are temporal networks and cumulative in nature that have shown promising results in various natural language processing tasks. Despite their success, it still remains a challenge to understand their hidden behavior. In this work, we analyze and interpret the cumulative nature of RNN via a proposed technique named as Layer-wIse-Semantic-Accumulation (LISA) for explaining decisions and detecting the most likely (i.e., saliency) patterns that the network relies on while decision making. We demonstrate (1) LISA: “How an RNN accumulates or builds semantics during its sequential processing for a given text example and expected response” (2) Example2pattern: “How the saliency patterns look like for each category in the data according to the network in decision making”. We analyse the sensitiveness of RNNs about different inputs to check the increase or decrease in prediction scores and further extract the saliency patterns learned by the network. We employ two relation classification datasets: SemEval 10 Task 8 and TAC KBP Slot Filling to explain RNN predictions via the LISA and example2pattern.

pdf bib
Interpretable Textual Neuron Representations for NLP
Nina Poerner | Benjamin Roth | Hinrich Schütze
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Input optimization methods, such as Google Deep Dream, create interpretable representations of neurons for computer vision DNNs. We propose and evaluate ways of transferring this technology to NLP. Our results suggest that gradient ascent with a gumbel softmax layer produces n-gram representations that outperform naive corpus search in terms of target neuron activation. The representations highlight differences in syntax awareness between the language and visual models of the Imaginet architecture.

pdf bib
Task Proposal: The TL;DR Challenge
Shahbaz Syed | Michael Völske | Martin Potthast | Nedim Lipka | Benno Stein | Hinrich Schütze
Proceedings of the 11th International Conference on Natural Language Generation

The TL;DR challenge fosters research in abstractive summarization of informal text, the largest and fastest-growing source of textual data on the web, which has been overlooked by summarization research so far. The challenge owes its name to the frequent practice of social media users to supplement long posts with a “TL;DR”—for “too long; didn’t read”—followed by a short summary as a courtesy to those who would otherwise reply with the exact same abbreviation to indicate they did not care to read a post for its apparent length. Posts featuring TL;DR summaries form an excellent ground truth for summarization, and by tapping into this resource for the first time, we have mined millions of training examples from social media, opening the door to all kinds of generative models.

pdf bib
Multi-Multi-View Learning: Multilingual and Multi-Representation Entity Typing
Yadollah Yaghoobzadeh | Hinrich Schütze
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Accurate and complete knowledge bases (KBs) are paramount in NLP. We employ mul-itiview learning for increasing the accuracy and coverage of entity type information in KBs. We rely on two metaviews: language and representation. For language, we consider high-resource and low-resource languages from Wikipedia. For representation, we consider representations based on the context distribution of the entity (i.e., on its embedding), on the entity’s name (i.e., on its surface form) and on its description in Wikipedia. The two metaviews language and representation can be freely combined: each pair of language and representation (e.g., German embedding, English description, Spanish name) is a distinct view. Our experiments on entity typing with fine-grained classes demonstrate the effectiveness of multiview learning. We release MVET, a large multiview — and, in particular, multilingual — entity typing dataset we created. Mono- and multilingual fine-grained entity typing systems can be evaluated on this dataset.

pdf bib
Neural Transductive Learning and Beyond: Morphological Generation in the Minimal-Resource Setting
Katharina Kann | Hinrich Schütze
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Neural state-of-the-art sequence-to-sequence (seq2seq) models often do not perform well for small training sets. We address paradigm completion, the morphological task of, given a partial paradigm, generating all missing forms. We propose two new methods for the minimal-resource setting: (i) Paradigm transduction: Since we assume only few paradigms available for training, neural seq2seq models are able to capture relationships between paradigm cells, but are tied to the idiosyncracies of the training set. Paradigm transduction mitigates this problem by exploiting the input subset of inflected forms at test time. (ii) Source selection with high precision (SHIP): Multi-source models which learn to automatically select one or multiple sources to predict a target inflection do not perform well in the minimal-resource setting. SHIP is an alternative to identify a reliable source if training data is limited. On a 52-language benchmark dataset, we outperform the previous state of the art by up to 9.71% absolute accuracy.

2017

pdf bib
Statistical Models for Unsupervised, Semi-Supervised Supervised Transliteration Mining
Hassan Sajjad | Helmut Schmid | Alexander Fraser | Hinrich Schütze
Computational Linguistics, Volume 43, Issue 2 - June 2017

We present a generative model that efficiently mines transliteration pairs in a consistent fashion in three different settings: unsupervised, semi-supervised, and supervised transliteration mining. The model interpolates two sub-models, one for the generation of transliteration pairs and one for the generation of non-transliteration pairs (i.e., noise). The model is trained on noisy unlabeled data using the EM algorithm. During training the transliteration sub-model learns to generate transliteration pairs and the fixed non-transliteration model generates the noise pairs. After training, the unlabeled data is disambiguated based on the posterior probabilities of the two sub-models. We evaluate our transliteration mining system on data from a transliteration mining shared task and on parallel corpora. For three out of four language pairs, our system outperforms all semi-supervised and supervised systems that participated in the NEWS 2010 shared task. On word pairs extracted from parallel corpora with fewer than 2% transliteration pairs, our system achieves up to 86.7% F-measure with 77.9% precision and 97.8% recall.

pdf bib
AutoExtend: Combining Word Embeddings with Semantic Resources
Sascha Rothe | Hinrich Schütze
Computational Linguistics, Volume 43, Issue 3 - September 2017

We present AutoExtend, a system that combines word embeddings with semantic resources by learning embeddings for non-word objects like synsets and entities and learning word embeddings that incorporate the semantic information from the resource. The method is based on encoding and decoding the word embeddings and is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The obtained embeddings live in the same vector space as the input word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet, GermaNet, and Freebase as semantic resources. AutoExtend achieves state-of-the-art performance on Word-in-Context Similarity and Word Sense Disambiguation tasks.

pdf bib
Proceedings of the First Workshop on Subword and Character Level Models in NLP
Manaal Faruqui | Hinrich Schuetze | Isabel Trancoso | Yadollah Yaghoobzadeh
Proceedings of the First Workshop on Subword and Character Level Models in NLP

pdf bib
Unlabeled Data for Morphological Generation With Character-Based Sequence-to-Sequence Models
Katharina Kann | Hinrich Schütze
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We present a semi-supervised way of training a character-based encoder-decoder recurrent neural network for morphological reinflection—the task of generating one inflected wordform from another. This is achieved by using unlabeled tokens or random strings as training data for an autoencoding task, adapting a network for morphological reinflection, and performing multi-task training. We thus use limited labeled data more effectively, obtaining up to 9.92% improvement over state-of-the-art baselines for 8 different languages.

pdf bib
Training Data Augmentation for Low-Resource Morphological Inflection
Toms Bergmanis | Katharina Kann | Hinrich Schütze | Sharon Goldwater
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection

pdf bib
The LMU System for the CoNLL-SIGMORPHON 2017 Shared Task on Universal Morphological Reinflection
Katharina Kann | Hinrich Schütze
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection

pdf bib
One-Shot Neural Cross-Lingual Transfer for Paradigm Completion
Katharina Kann | Ryan Cotterell | Hinrich Schütze
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a novel cross-lingual transfer method for paradigm completion, the task of mapping a lemma to its inflected forms, using a neural encoder-decoder model, the state of the art for the monolingual task. We use labeled data from a high-resource language to increase performance on a low-resource language. In experiments on 21 language pairs from four different language families, we obtain up to 58% higher accuracy than without transfer and show that even zero-shot and one-shot learning are possible. We further find that the degree of language relatedness strongly influences the ability to transfer morphological knowledge.

pdf bib
Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages
Ehsaneddin Asgari | Hinrich Schütze
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting – to the best of our knowledge – the largest crosslingual computational study performed to date. We extend existing methodology for leveraging parallel corpora for typological analysis by overcoming a limiting assumption of earlier work: We only require that a linguistic feature is overtly marked in a few of thousands of languages as opposed to requiring that it be marked in all languages under investigation.

pdf bib
Global Normalization of Convolutional Neural Networks for Joint Entity and Relation Classification
Heike Adel | Hinrich Schütze
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We introduce globally normalized convolutional neural networks for joint entity classification and relation extraction. In particular, we propose a way to utilize a linear-chain conditional random field output layer for predicting entity types and relations between entities at the same time. Our experiments show that global normalization outperforms a locally normalized softmax layer on a benchmark dataset.

pdf bib
Exploring Different Dimensions of Attention for Uncertainty Detection
Heike Adel | Hinrich Schütze
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Neural networks with attention have proven effective for many natural language processing tasks. In this paper, we develop attention mechanisms for uncertainty detection. In particular, we generalize standardly used attention mechanisms by introducing external attention and sequence-preserving attention. These novel architectures differ from standard approaches in that they use external resources to compute attention weights and preserve sequence information. We compare them to other configurations along different dimensions of attention. Our novel architectures set the new state of the art on a Wikipedia benchmark dataset and perform similar to the state-of-the-art model on a biomedical benchmark which uses a large set of linguistic features.

pdf bib
Neural Multi-Source Morphological Reinflection
Katharina Kann | Ryan Cotterell | Hinrich Schütze
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We explore the task of multi-source morphological reinflection, which generalizes the standard, single-source version. The input consists of (i) a target tag and (ii) multiple pairs of source form and source tag for a lemma. The motivation is that it is beneficial to have access to more than one source form since different source forms can provide complementary information, e.g., different stems. We further present a novel extension to the encoder-decoder recurrent neural architecture, consisting of multiple encoders, to better solve the task. We show that our new architecture outperforms single-source reinflection models and publish our dataset for multi-source morphological reinflection to facilitate future research.

pdf bib
Multi-level Representations for Fine-Grained Typing of Knowledge Base Entities
Yadollah Yaghoobzadeh | Hinrich Schütze
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Entities are essential elements of natural language. In this paper, we present methods for learning multi-level representations of entities on three complementary levels: character (character patterns in entity names extracted, e.g., by neural networks), word (embeddings of words in entity names) and entity (entity embeddings). We investigate state-of-the-art learning methods on each level and find large differences, e.g., for deep learning models, traditional ngram features and the subword model of fasttext (Bojanowski et al., 2016) on the character level; for word2vec (Mikolov et al., 2013) on the word level; and for the order-aware model wang2vec (Ling et al., 2015a) on the entity level. We confirm experimentally that each level of representation contributes complementary information and a joint representation of all three levels improves the existing embedding based baseline for fine-grained entity typing by a large margin. Additionally, we show that adding information from entity descriptions further improves multi-level representations of entities.

pdf bib
Task-Specific Attentive Pooling of Phrase Alignments Contributes to Sentence Matching
Wenpeng Yin | Hinrich Schütze
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

This work studies comparatively two typical sentence matching tasks: textual entailment (TE) and answer selection (AS), observing that weaker phrase alignments are more critical in TE, while stronger phrase alignments deserve more attention in AS. The key to reach this observation lies in phrase detection, phrase representation, phrase alignment, and more importantly how to connect those aligned phrases of different matching degrees with the final classifier. Prior work (i) has limitations in phrase generation and representation, or (ii) conducts alignment at word and phrase levels by handcrafted features or (iii) utilizes a single framework of alignment without considering the characteristics of specific tasks, which limits the framework’s effectiveness across tasks. We propose an architecture based on Gated Recurrent Unit that supports (i) representation learning of phrases of arbitrary granularity and (ii) task-specific attentive pooling of phrase alignments between two sentences. Experimental results on TE and AS match our observation and show the effectiveness of our approach.

pdf bib
Nonsymbolic Text Representation
Hinrich Schütze
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We introduce the first generic text representation model that is completely nonsymbolic, i.e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text. This applies to training the parameters of the model on a training corpus as well as to applying it when computing the representation of a new text. We show that our model performs better than prior work on an information extraction and a text denoising task.

pdf bib
Noise Mitigation for Neural Entity Typing and Relation Extraction
Yadollah Yaghoobzadeh | Heike Adel | Hinrich Schütze
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In this paper, we address two different types of noise in information extraction models: noise from distant supervision and noise from pipeline input features. Our target tasks are entity typing and relation extraction. For the first noise type, we introduce multi-instance multi-label learning algorithms using neural network models, and apply them to fine-grained entity typing for the first time. Our model outperforms the state-of-the-art supervised approach which uses global embeddings of entities. For the second noise type, we propose ways to improve the integration of noisy entity type predictions into relation extraction. Our experiments show that probabilistic predictions are more robust than discrete predictions and that joint training of the two tasks performs best.

pdf bib
End-to-End Trainable Attentive Decoder for Hierarchical Entity Classification
Sanjeev Karn | Ulli Waltinger | Hinrich Schütze
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We address fine-grained entity classification and propose a novel attention-based recurrent neural network (RNN) encoder-decoder that generates paths in the type hierarchy and can be trained end-to-end. We show that our model performs better on fine-grained entity classification than prior work that relies on flat or local classifiers that do not directly model hierarchical structure.

2016

pdf bib
LAMB: A Good Shepherd of Morphologically Rich Languages
Sebastian Ebert | Thomas Müller | Hinrich Schütze
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Neural Morphological Analysis: Encoding-Decoding Canonical Segments
Katharina Kann | Ryan Cotterell | Hinrich Schütze
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Morphological Segmentation Inside-Out
Ryan Cotterell | Arun Kumar | Hinrich Schütze
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Intrinsic Subspace Evaluation of Word Embedding Representations
Yadollah Yaghoobzadeh | Hinrich Schütze
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning Word Meta-Embeddings
Wenpeng Yin | Hinrich Schütze
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Morphological Smoothing and Extrapolation of Word Embeddings
Ryan Cotterell | Hinrich Schütze | Jason Eisner
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Word Embedding Calculus in Meaningful Ultradense Subspaces
Sascha Rothe | Hinrich Schütze
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection
Katharina Kann | Hinrich Schütze
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Attention-Based Convolutional Neural Network for Machine Comprehension
Wenpeng Yin | Sebastian Ebert | Hinrich Schütze
Proceedings of the Workshop on Human-Computer Question Answering

pdf bib
MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection
Katharina Kann | Hinrich Schütze
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

pdf bib
Combining Recurrent and Convolutional Neural Networks for Relation Classification
Ngoc Thang Vu | Heike Adel | Pankaj Gupta | Hinrich Schütze
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Joint Model of Orthography and Morphological Segmentation
Ryan Cotterell | Tim Vieira | Hinrich Schütze
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Ultradense Word Embeddings by Orthogonal Transformation
Sascha Rothe | Sebastian Ebert | Hinrich Schütze
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Comparing Convolutional Neural Networks to Traditional Models for Slot Filling
Heike Adel | Benjamin Roth | Hinrich Schütze
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs
Wenpeng Yin | Hinrich Schütze | Bing Xiang | Bowen Zhou
Transactions of the Association for Computational Linguistics, Volume 4

How to model a pair of sentences is a critical issue in many NLP tasks such as answer selection (AS), paraphrase identification (PI) and textual entailment (TE). Most prior work (i) deals with one individual task by fine-tuning a specific system; (ii) models each sentence’s representation separately, rarely considering the impact of the other sentence; or (iii) relies fully on manually designed, task-specific linguistic features. This work presents a general Attention Based Convolutional Neural Network (ABCNN) for modeling a pair of sentences. We make three contributions. (i) The ABCNN can be applied to a wide variety of tasks that require modeling of sentence pairs. (ii) We propose three attention schemes that integrate mutual influence between sentences into CNNs; thus, the representation of each sentence takes into consideration its counterpart. These interdependent sentence pair representations are more powerful than isolated sentence representations. (iii) ABCNNs achieve state-of-the-art performance on AS, PI and TE tasks. We release code at: https://github.com/yinwenpeng/Answer_Selection.

pdf bib
Simple Question Answering by Attentive Convolutional Neural Network
Wenpeng Yin | Mo Yu | Bing Xiang | Bowen Zhou | Hinrich Schütze
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This work focuses on answering single-relation factoid questions over Freebase. Each question can acquire the answer from a single fact of form (subject, predicate, object) in Freebase. This task, simple question answering (SimpleQA), can be addressed via a two-step pipeline: entity linking and fact selection. In fact selection, we match the subject entity in a fact candidate with the entity mention in the question by a character-level convolutional neural network (char-CNN), and match the predicate in that fact with the question by a word-level CNN (word-CNN). This work makes two main contributions. (i) A simple and effective entity linker over Freebase is proposed. Our entity linker outperforms the state-of-the-art entity linker over SimpleQA task. (ii) A novel attentive maxpooling is stacked over word-CNN, so that the predicate representation can be matched with the predicate-focused question representation more effectively. Experiments show that our system sets new state-of-the-art in this task.

pdf bib
Table Filling Multi-Task Recurrent Neural Network for Joint Entity and Relation Extraction
Pankaj Gupta | Hinrich Schütze | Bernt Andrassy
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper proposes a novel context-aware joint entity and word-level relation extraction approach through semantic composition of words, introducing a Table Filling Multi-Task Recurrent Neural Network (TF-MTRNN) model that reduces the entity recognition and relation classification tasks to a table-filling problem and models their interdependencies. The proposed neural network architecture is capable of modeling multiple relation instances without knowing the corresponding relation arguments in a sentence. The experimental results show that a simple approach of piggybacking candidate entities to model the label dependencies from relations to entities improves performance. We present state-of-the-art results with improvements of 2.0% and 2.7% for entity recognition and relation classification, respectively on CoNLL04 dataset.

2015

pdf bib
Learning Better Embeddings for Rare Words Using Distributional Representations
Irina Sergienya | Hinrich Schütze
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Corpus-level Fine-grained Entity Typing Using Contextual Information
Yadollah Yaghoobzadeh | Hinrich Schütze
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Online Updating of Word Representations for Part-of-Speech Tagging
Wenpeng Yin | Tobias Schnabel | Hinrich Schütze
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Joint Lemmatization and Morphological Tagging with Lemming
Thomas Müller | Ryan Cotterell | Alexander Fraser | Hinrich Schütze
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Linguistically Informed Convolutional Neural Network
Sebastian Ebert | Ngoc Thang Vu | Hinrich Schütze
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
The Operation Sequence Model—Combining N-Gram-Based and Phrase-Based Statistical Machine Translation
Nadir Durrani | Helmut Schmid | Alexander Fraser | Philipp Koehn | Hinrich Schütze
Computational Linguistics, Volume 41, Issue 2 - June 2015

pdf bib
CIS-positive: A Combination of Convolutional Neural Networks and Support Vector Machines for Sentiment Analysis in Twitter
Sebastian Ebert | Ngoc Thang Vu | Hinrich Schütze
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity
Wenpeng Yin | Hinrich Schütze
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes
Sascha Rothe | Hinrich Schütze
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Labeled Morphological Segmentation with Semi-Markov Models
Ryan Cotterell | Thomas Müller | Alexander Fraser | Hinrich Schütze
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Multichannel Variable-Size Convolution for Sentence Classification
Wenpeng Yin | Hinrich Schütze
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Robust Morphological Tagging with Word Representations
Thomas Müller | Hinrich Schuetze
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Convolutional Neural Network for Paraphrase Identification
Wenpeng Yin | Hinrich Schütze
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Morphological Word-Embeddings
Ryan Cotterell | Hinrich Schütze
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Discriminative Phrase Embedding for Paraphrase Identification
Wenpeng Yin | Hinrich Schütze
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Multi-Domain Sentiment Relevance Classification with Automatic Representation Learning
Christian Scheible | Hinrich Schütze
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
FLORS: Fast and Simple Domain Adaptation for Part-of-Speech Tagging
Tobias Schnabel | Hinrich Schütze
Transactions of the Association for Computational Linguistics, Volume 2

We present FLORS, a new part-of-speech tagger for domain adaptation. FLORS uses robust representations that work especially well for unknown words and for known words with unseen tags. FLORS is simpler and faster than previous domain adaptation methods, yet it has significantly better accuracy than several baselines.

pdf bib
Unsupervised Training Set Generation for Automatic Acquisition of Technical Terminology in Patents
Alex Judea | Hinrich Schütze | Soeren Bruegmann
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Picking the Amateur’s Mind - Predicting Chess Player Strength from Game Annotations
Christian Scheible | Hinrich Schütze
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Dependency parsing with latent refinements of part-of-speech tags
Thomas Mueller | Richard Farkas | Alex Judea | Helmut Schmid | Hinrich Schuetze
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Fine-Grained Contextual Predictions for Hard Sentiment Words
Sebastian Ebert | Hinrich Schütze
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Using Mined Coreference Chains as a Resource for a Semantic Task
Heike Adel | Hinrich Schütze
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure
Sascha Rothe | Hinrich Schütze
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Improving Citation Polarity Classification with Product Reviews
Charles Jochim | Hinrich Schütze
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
An Exploration of Embeddings for Generalized Phrases
Wenpeng Yin | Hinrich Schütze
Proceedings of the ACL 2014 Student Research Workshop

2013

pdf bib
Efficient Higher-Order CRFs for Morphological Tagging
Thomas Mueller | Helmut Schmid | Hinrich Schütze
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
The Topology of Semantic Knowledge
Jimmy Dubuisson | Jean-Pierre Eckmann | Christian Scheible | Hinrich Schütze
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Towards Robust Cross-Domain Domain Adaptation for Part-of-Speech Tagging
Tobias Schnabel | Hinrich Schütze
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Multilingual Lexicon Bootstrapping - Improving a Lexicon Induction System Using a Parallel Corpus
Patrick Ziering | Lonneke van der Plas | Hinrich Schütze
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Bootstrapping Semantic Lexicons for Technical Domains
Patrick Ziering | Lonneke van der Plas | Hinrich Schütze
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hinrich Schuetze | Pascale Fung | Massimo Poesio
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Sentiment Relevance
Christian Scheible | Hinrich Schütze
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Hinrich Schuetze | Pascale Fung | Massimo Poesio
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
The Operation Sequence Model: Integrating Translation and Reordering Operations in a Single Left-to-Right Model
Hinrich Schütze
Proceedings of Machine Translation Summit XIV: Plenaries

pdf bib
Knowledge Sources for Constituent Parsing of German, a Morphologically Rich and Less-Configurational Language
Alexander Fraser | Helmut Schmid | Richárd Farkas | Renjing Wang | Hinrich Schütze
Computational Linguistics, Volume 39, Issue 1 - March 2013

pdf bib
CodeX: Combining an SVM Classifier and Character N-gram Language Models for Sentiment Analysis on Twitter Text
Qi Han | Junfei Guo | Hinrich Schuetze
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
Bootstrapping Sentiment Labels For Unannotated Documents With Polarity PageRank
Christian Scheible | Hinrich Schütze
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present a novel graph-theoretic method for the initial annotation of high-confidence training data for bootstrapping sentiment classifiers. We estimate polarity using topic-specific PageRank. Sentiment information is propagated from an initial seed lexicon through a joint graph representation of words and documents. We report improved classification accuracies across multiple domains for the base models and the maximum entropy model bootstrapped from the PageRank annotation.

pdf bib
Automatic Detection of Point of View Differences in Wikipedia
Khalid Al Khatib | Hinrich Schütze | Cathleen Kantner
Proceedings of COLING 2012

pdf bib
Towards a Generic and Flexible Citation Classifier Based on a Faceted Classification Scheme
Charles Jochim | Hinrich Schütze
Proceedings of COLING 2012

pdf bib
Classification of Inconsistent Sentiment Words using Syntactic Constructions
Wiltrud Kessler | Hinrich Schütze
Proceedings of COLING 2012: Posters

pdf bib
A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union
Thomas Mueller | Hinrich Schuetze | Helmut Schmid
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Active Learning for Coreference Resolution
Florian Laws | Florian Heimerl | Hinrich Schütze
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Automatic generation of short informative sentiment summaries
Andrea Glaser | Hinrich Schütze
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Supervised Coreference Resolution with SUCRE
Hamidreza Kobdani | Hinrich Schuetze
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Half-Context Language Models
Hinrich Schütze | Michael Walsh
Computational Linguistics, Volume 37, Issue 4 - December 2011

pdf bib
A Cascaded Classification Approach to Semantic Head Recognition
Lukas Michelbacher | Alok Kothari | Martin Forst | Christina Lioma | Hinrich Schütze
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Active Learning with Amazon Mechanical Turk
Florian Laws | Christian Scheible | Hinrich Schütze
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Bootstrapping coreference resolution using word associations
Hamidreza Kobdani | Hinrich Schuetze | Michael Schiehlen | Hans Kamp
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
Stefan Rüd | Massimiliano Ciaramita | Jens Müller | Hinrich Schütze
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Integrating history-length interpolation and classes in language modeling
Hinrich Schütze
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes
Thomas Mueller | Hinrich Schuetze
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
SUCRE: A Modular System for Coreference Resolution
Hamidreza Kobdani | Hinrich Schütze
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Bitext-Based Resolution of German Subject-Object Ambiguities
Florian Schwarck | Alexander Fraser | Hinrich Schütze
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Identification of Rare & Novel Senses Using Translations in a Parallel Corpus
Richard Schwarz | Hinrich Schütze | Fabienne Martin | Achim Stein
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The identification of rare and novel senses is a challenge in lexicography. In this paper, we present a new method for finding such senses using a word aligned multilingual parallel corpus. We use the Europarl corpus and therein concentrate on French verbs. We represent each occurrence of a French verb as a high dimensional term vector. The dimensions of such a vector are the possible translations of the verb according to the underlying word alignment. The dimensions are weighted by a weighting scheme to adjust to the significance of any particular translation. After collecting these vectors we apply forms of the K-means algorithm on the resulting vector space to produce clusters of distinct senses, so that standard uses produce large homogeneous clusters while rare and novel uses appear in small or heterogeneous clusters. We show in a qualitative and quantitative evaluation that the method can successfully find rare and novel senses.

pdf bib
BabyExp: Constructing a Huge Multimodal Resource to Acquire Commonsense Knowledge Like Children Do
Massimo Poesio | Marco Baroni | Oswald Lanz | Alessandro Lenci | Alexandros Potamianos | Hinrich Schütze | Sabine Schulte im Walde | Luca Surian
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

There is by now widespread agreement that the most realistic way to construct the large-scale commonsense knowledge repositories required by natural language and artificial intelligence applications is by letting machines learn such knowledge from large quantities of data, like humans do. A lot of attention has consequently been paid to the development of increasingly sophisticated machine learning algorithms for knowledge extraction. However, the nature of the input that humans are exposed to while learning commonsense knowledge has received much less attention. The BabyExp project is collecting very dense audio and video recordings of the first 3 years of life of a baby. The corpus constructed in this way will be transcribed with automated techniques and made available to the research community. Moreover, techniques to extract commonsense conceptual knowledge incrementally from these multimodal data are also being explored within the project. The current paper describes BabyExp in general, and presents pilot studies on the feasibility of the automated audio and video transcriptions.

pdf bib
Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure
Lukas Michelbacher | Florian Laws | Beate Dorow | Ulrich Heid | Hinrich Schütze
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The Internet is an ever growing source of information stored in documents of different languages. Hence, cross-lingual resources are needed for more and more NLP applications. This paper presents (i) a graph-based method for creating one such resource and (ii) a resource created using the method, a cross-lingual relatedness thesaurus. Given a word in one language, the thesaurus suggests words in a second language that are semantically related. The method requires two monolingual corpora and a basic dictionary. Our general approach is to build two monolingual word graphs, with nodes representing words and edges representing linguistic relations between words. A bilingual dictionary containing basic vocabulary provides seed translations relating nodes from both graphs. We then use an inter-graph node-similarity algorithm to discover related words. Evaluation with three human judges revealed that 49% of the English and 57% of the German words discovered by our method are semantically related to the target words. We publish two resources in conjunction with this paper. First, noun coordinations extracted from the German and English Wikipedias. Second, the cross-lingual relatedness thesaurus which can be used in experiments involving interactive cross-lingual query expansion.

pdf bib
Fine-Grained Geographical Relation Extraction from Wikipedia
Andre Blessing | Hinrich Schütze
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present work on enhancing the basic data resource of a context-aware system. Electronic text offers a wealth of information about geospatial data and can be used to improve the completeness and accuracy of geospatial resources (e.g., gazetteers). First, we introduce a supervised approach to extracting geographical relations on a fine-grained level. Second, we present a novel way of using Wikipedia as a corpus based on self-annotation. A self-annotation is an automatically created high-quality annotation that can be used for training and evaluation. Wikipedia contains two types of different context: (i) unstructured text and (ii) structured data: templates (e.g., infoboxes about cities), lists and tables. We use the structured data to annotate the unstructured text. Finally, the extracted fine-grained relations are used to complete gazetteer data. The precision and recall scores of more than 97 percent confirm that a statistical IE pipeline can be used to improve the data quality of community-based resources.

pdf bib
Self-Annotation for fine-grained geospatial relation extraction
Andre Blessing | Hinrich Schütze
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
A Linguistically Grounded Graph Model for Bilingual Lexicon Extraction
Florian Laws | Lukas Michelbacher | Beate Dorow | Christian Scheible | Ulrich Heid | Hinrich Schütze
Coling 2010: Posters

pdf bib
Sentiment Translation through Multi-Edge Graphs
Christian Scheible | Florian Laws | Lukas Michelbacher | Hinrich Schütze
Coling 2010: Posters

2009

pdf bib
Word Alignment by Thresholded Two-Dimensional Normalization
Hamidreza Kobdani | Alexander Fraser | Hinrich Schütze
Proceedings of Machine Translation Summit XII: Posters

pdf bib
Rich Bitext Projection Features for Parse Reranking
Alexander Fraser | Renjing Wang | Hinrich Schütze
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Frequency Matters: Pitch Accents and Information Status
Katrin Schweitzer | Michael Walsh | Bernd Möbius | Arndt Riester | Antje Schweitzer | Hinrich Schütze
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Unsupervised Classification with Dependency Based Word Spaces
Klaus Rothenhäusler | Hinrich Schütze
Proceedings of the Workshop on Geometrical Models of Natural Language Semantics

pdf bib
On Proper Unit Selection in Active Learning: Co-Selection Effects for Named Entity Recognition
Katrin Tomanek | Florian Laws | Udo Hahn | Hinrich Schütze
Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing

2008

pdf bib
Stopping Criteria for Active Learning of Named Entity Recognition
Florian Laws | Hinrich Schütze
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
A Graph-theoretic Model of Lexical Syntactic Acquisition
Hinrich Schütze | Michael Walsh
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
An Inverted Index for Storing and Retrieving Grammatical Dependencies
Michaela Atterer | Hinrich Schütze
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Web count statistics gathered from search engines have been widely used as a resource in a variety of NLP tasks. For some tasks, however, the information they exploit is not fine-grained enough. We propose an inverted index over grammatical relations as a fast and reliable resource to access more general and also more detailed frequency information. To build the index, we use a dependency parser to parse a large corpus. We extract binary dependency relations, such as he-subj-say (“he” is the subject of “say”) as index terms and construct the index using publicly available open-source indexing software. The unit we index over is the sentence. The index can be used to extract grammatical relations and frequency counts for these relations. The framework also provides the possibility to search for partial dependencies (say, the frequency of “he” occurring in subject position), words, strings and a combination of these. One possible application is the disambiguation of syntactic structures.

pdf bib
A Question Answering System for German. Experiments with Morphological Linguistic Resources
Florian Koehler | Hinrich Schuetze | Michaela Atterer
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Question Answering systems are systems that enable the user to ask questions in natural language and to also receive an answer in natural language. Most existing systems, however, are constructed for the English language, and it is not clear in how far these approaches are also applicable to other languages. A richer morphology, greater syntactic variability, and smaller fraction of webpages available in the language are just some issues that complicate the construction of systems for German. In this paper, we present a modular Question Answering System for German which uses several morphological resources to increase recall. Nouns are converted into verbs, verbs into nouns, and the tenses of verbs are modified. We use a web search engine as a back end to allow for open-domain Question Answering. A POS-tagger is employed to identify answer candidates which are then filtered and tiled. The system is shown to achieve a higher recall than other systems for German.

2007

pdf bib
Squibs: Prepositional Phrase Attachment without Oracles
Michaela Atterer | Hinrich Schütze
Computational Linguistics, Volume 33, Number 4, December 2007

2006

pdf bib
The Effect of Corpus Size in Combining Supervised and Unsupervised Training for Disambiguation
Michaela Atterer | Hinrich Schütze
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
A Lattice-Based Framework for Enhancing Statistical Parsers with Information from Unlabeled Corpora
Michaela Atterer | Hinrich Schütze
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

1999

pdf bib
Book Reviews: Ambiguity Resolution in Language Learning: Computational and Cognitive Models
Hinrich Schütze
Computational Linguistics, Volume 25, Number 3, September 1999

1998

pdf bib
Automatic Word Sense Discrimination
Hinrich Schütze
Computational Linguistics, Volume 24, Number 1, March 1998 - Special Issue on Word Sense Disambiguation

1997

pdf bib
Automatic Detection of Text Genre
Brett Kessler | Geoffrey Nunberg | Hinrich Schutze
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1995

pdf bib
Distributional Part-of-Speech Tagging
Hinrich Schütze
Seventh Conference of the European Chapter of the Association for Computational Linguistics

1994

pdf bib
Part-of-Speech Tagging Using a Variable Memory Markov Model
Hinrich Schuetze | Yoram Singer
32nd Annual Meeting of the Association for Computational Linguistics

1993

pdf bib
Customizing a Lexicon to Better Suit a Computational Task
Marti Hearst | Hinrich Schuetze
Acquisition of Lexical Knowledge from Text

pdf bib
Part-of-Speech Induction From Scratch
Hinrich Schütze
31st Annual Meeting of the Association for Computational Linguistics

Search
Co-authors