Michael Strube - ACL Anthology

Michael Strube

2025

Discourse Relation-Enhanced Neural Coherence Modeling
Wei Liu | Michael Strube
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Discourse coherence theories posit relations between text spans as a key feature of coherent texts. However, existing work on coherence modeling has paid little attention to discourse relations. In this paper, we provide empirical evidence to demonstrate that relation features are correlated with text coherence. Then, we investigate a novel fusion model that uses position-aware attention and a visible matrix to combine text- and relation-based features for coherence assessment. Experimental results on two benchmarks show that our approaches can significantly improve baselines, demonstrating the importance of relation features for coherence modeling.

Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)
Michael Strube | Chloe Braud | Christian Hardmeier | Junyi Jessy Li | Sharid Loaiciga | Amir Zeldes | Chuyuan Li
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)

Entity Tracking in Small Language Models: An Attention-Based Study of Parameter-Efficient Fine-Tuning
Sungho Jeon | Michael Strube
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)

The ability to track entities is fundamental for language understanding, yet the internal mechanisms governing this capability in Small Language Models (SLMs) are poorly understood. Previous studies often rely on indirect probing or complex interpretability methods, leaving a gap for lightweight diagnostics that connect model behavior to performance. To bridge this gap, we introduce a framework to analyze entity tracking by measuring the attention flow between entity and non-entity tokens within SLMs. We apply this to analyze models both before and after Parameter-Efficient Fine-Tuning (PEFT). Our analysis reveals two key findings. First, SLMs’ attentional strategies vary significantly with text type, but entities consistently receive a high degree of focus. Second, we show that PEFT – specifically QLoRA – dramatically improves classification performance on entity-centric tasks by increasing the model’s attentional focus on entity-related tokens. Our work provides direct evidence for how PEFT can refine a model’s internal mechanisms and establishes attention analysis as a valuable, lightweight diagnostic tool for interpreting and improving SLMs.

On the Role of Context for Discourse Relation Classification in Scientific Writing
Stephen Wan | Wei Liu | Michael Strube
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)

With the increasing use of generative Artificial Intelligence (AI) methods to support science workflows, we are interested in the use of discourse-level information to find supporting evidence for AI generated scientific claims. A first step towards this objective is to examine the task of inferring discourse structure in scientific writing.In this work, we present a preliminary investigation of pretrained language model (PLM) and Large Language Model (LLM) approaches for Discourse Relation Classification (DRC), focusing on scientific publications, an under-studied genre for this task. We examine how context can help with the DRC task, with our experiments showing that context, as defined by discourse structure, is generally helpful. We also present an analysis of which scientific discourse relation types might benefit most from context.

HITS at DISRPT 2025: Discourse Segmentation, Connective Detection, and Relation Classification
Souvik Banerjee | Yi Fan | Michael Strube
Proceedings of the 4th Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2025)

This paper describes the submission of the HITS team to the DISRPT 2025 shared task. The shared task includes three sub-tasks: (1) discourse unit segmentation across formalisms, (2) cross-lingual discourse connective identification, and (3) cross-formalism discourse relation classification. This paper presents our strategies for the DISRPT 2025 Shared Task. In Task 1, our approach involves fine-tuning through multilingual joint training on linguistically motivated language groups. We incorporated two key techniques to improve model performance: a weighted loss function to address the task’s significant class imbalance and Fast Gradient Method (FGM) adversarial training to boost the model’s robustness. In task 2, our approach involves building an ensemble of three encoder models whose embeddings are smartly fused together with a multi-head attention layer. We also add Part-Of-Speech tags and dependency relations present in the training file as linguistic features. A CRF layer is added after the classification layer to account for dependencies between adjacent labels. To account for label imbalance, we use focal loss and label smoothing. This ensures our model is robust and flexible enough to handle different languages. In task 3, we use two-stage fine-tuning framework designed to transfer the nuanced reasoning capabilities of a very large “teacher” model to a compact “student” model so that the smaller model can learn complex discourse relationships. The fine-tuning process follows a curriculum learning framework. In such a framework the model learns to perform increasingly harder tasks. In our case, the model first learns to look at the discourse units and then predict the label followed by looking at Chain-Of-Thought reasoning for harder examples. This way it can learn to internalise such reasoning and increase prediction accuracy on the harder samples.

Joint Modeling of Entities and Discourse Relations for Coherence Assessment
Wei Liu | Michael Strube
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

In linguistics, coherence can be achieved by different means, such as by maintaining reference to the same set of entities across sentences and by establishing discourse relations between them. However, most existing work on coherence modeling focuses exclusively on either entity features or discourse relation features, with little attention given to combining the two. In this study, we explore two methods for jointly modeling entities and discourse relations for coherence assessment. Experiments on three benchmark datasets show that integrating both types of features significantly enhances the performance of coherence models, highlighting the benefits of modeling both simultaneously for coherence evaluation.

Consistent Discourse-level Temporal Relation Extraction Using Large Language Models
Yi Fan | Michael Strube
Findings of the Association for Computational Linguistics: EMNLP 2025

Understanding temporal relations between events in a text is essential for determining its temporal structure. Recent advancements in large language models (LLMs) have spurred research on temporal relation extraction. However, LLMs perform poorly in zero-shot and few-shot settings, often underperforming smaller fine-tuned models. Despite these limitations, little attention has been given to improving LLMs in temporal structure extraction tasks. This study systematically examines LLMs’ ability to extract and infer discourse-level temporal relations, identifying factors influencing their reasoning and extraction capabilities, including input context, reasoning process and ensuring consistency. We propose a three-step framework to improve LLMs’ temporal relation extraction capabilities: context selection, prompts inspired by Allen’s interval algebra (Allen, 1983), and reflection-based consistency learning (Shinn et al., 2024). Our results show the effectiveness of our method in guiding LLMs towards structured processing of temporal structure in discourse.

2024

Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)
Michael Strube | Chloe Braud | Christian Hardmeier | Junyi Jessy Li | Sharid Loaiciga | Amir Zeldes | Chuyuan Li
Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)

Graph-based Clustering for Detecting Semantic Change Across Time and Languages
Xianghe Ma | Michael Strube | Wei Zhao
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite the predominance of contextualized embeddings in NLP, approaches to detect semantic change relying on these embeddings and clustering methods underperform simpler counterparts based on static word embeddings. This stems from the poor quality of the clustering methods to produce sense clusters—which struggle to capture word senses, especially those with low frequency. This issue hinders the next step in examining how changes in word senses in one language influence another. To address this issue, we propose a graph-based clustering approach to capture nuanced changes in both high- and low-frequency word senses across time and languages, including the acquisition and loss of these senses over time. Our experimental results show that our approach substantially surpasses previous approaches in the SemEval2020 binary classification task across four languages. Moreover, we showcase the ability of our approach as a versatile visualization tool to detect semantic changes in both intra-language and inter-language setups. We make our code and data publicly available.

What Causes the Failure of Explicit to Implicit Discourse Relation Recognition?
Wei Liu | Stephen Wan | Michael Strube
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

We consider an unanswered question in the discourse processing community: why do relation classifiers trained on explicit examples (with connectives removed) perform poorly in real implicit scenarios? Prior work claimed this is due to linguistic dissimilarity between explicit and implicit examples but provided no empirical evidence. In this study, we show that one cause for such failure is a label shift after connectives are eliminated. Specifically, we find that the discourse relations expressed by some explicit instances will change when connectives disappear. Unlike previous work manually analyzing a few examples, we present empirical evidence at the corpus level to prove the existence of such shift. Then, we analyze why label shift occurs by considering factors such as the syntactic role played by connectives, ambiguity of connectives, and more. Finally, we investigate two strategies to mitigate the label shift: filtering out noisy data and joint learning with connectives. Experiments on PDTB 2.0, PDTB 3.0, and the GUM dataset demonstrate that classifiers trained with our strategies outperform strong baselines.

2023

Cross-lingual Science Journalism: Select, Simplify and Rewrite Summaries for Non-expert Readers
Mehwish Fatima | Michael Strube
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Automating Cross-lingual Science Journalism (CSJ) aims to generate popular science summaries from English scientific texts for non-expert readers in their local language. We introduce CSJ as a downstream task of text simplification and cross-lingual scientific summarization to facilitate science journalists’ work. We analyze the performance of possible existing solutions as baselines for the CSJ task. Based on these findings, we propose to combine the three components - SELECT, SIMPLIFY and REWRITE (SSR) to produce cross-lingual simplified science summaries for non-expert readers. Our empirical evaluation on the Wikipedia dataset shows that SSR significantly outperforms the baselines for the CSJ task and can serve as a strong baseline for future work. We also perform an ablation study investigating the impact of individual components of SSR. Further, we analyze the performance of SSR on a high-quality, real-world CSJ dataset with human evaluation and in-depth analysis, demonstrating the superior performance of SSR for CSJ.

Modeling Structural Similarities between Documents for Coherence Assessment with Graph Convolutional Networks
Wei Liu | Xiyan Fu | Michael Strube
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Coherence is an important aspect of text quality, and various approaches have been applied to coherence modeling. However, existing methods solely focus on a single document’s coherence patterns, ignoring the underlying correlation between documents. We investigate a GCN-based coherence model that is capable of capturing structural similarities between documents. Our model first creates a graph structure for each document, from where we mine different subgraph patterns. We then construct a heterogeneous graph for the training corpus, connecting documents based on their shared subgraphs. Finally, a GCN is applied to the heterogeneous graph to model the connectivity relationships. We evaluate our method on two tasks, assessing discourse coherence and automated essay scoring. Results show that our GCN-based model outperforms all baselines, achieving a new state-of-the-art on both tasks.

Annotation-Inspired Implicit Discourse Relation Classification with Auxiliary Discourse Connective Generation
Wei Liu | Michael Strube
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Implicit discourse relation classification is a challenging task due to the absence of discourse connectives. To overcome this issue, we design an end-to-end neural model to explicitly generate discourse connectives for the task, inspired by the annotation process of PDTB. Specifically, our model jointly learns to generate discourse connectives between arguments and predict discourse relations based on the arguments and the generated connectives. To prevent our relation classifier from being misled by poor connectives generated at the early stage of training while alleviating the discrepancy between training and inference, we adopt Scheduled Sampling to the joint learning. We evaluate our method on three benchmarks, PDTB 2.0, PDTB 3.0, and PCC. Results show that our joint model significantly outperforms various baselines on three datasets, demonstrating its superiority for the task.

Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023)
Michael Strube | Chloe Braud | Christian Hardmeier | Junyi Jessy Li | Sharid Loaiciga | Amir Zeldes
Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023)

HITS at DISRPT 2023: Discourse Segmentation, Connective Detection, and Relation Classification
Wei Liu | Yi Fan | Michael Strube
Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023)

HITS participated in the Discourse Segmentation (DS, Task 1) and Connective Detection (CD, Task 2) tasks at the DISRPT 2023. Task 1 focuses on segmenting the text into discourse units, while Task 2 aims to detect the discourse connectives. We deployed a framework based on different pre-trained models according to the target language for these two tasks.HITS also participated in the Relation Classification track (Task 3). The main task was recognizing the discourse relation between text spans from different languages. We designed a joint model for languages with a small corpus while separate models for large corpora. The adversarial training strategy is applied to enhance the robustness of relation classifiers.

DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
Wei Zhao | Michael Strube | Steffen Eger
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Recently, there has been a growing interest in designing text generation systems from a discourse coherence perspective, e.g., modeling the interdependence between sentences. Still, recent BERT-based evaluation metrics are weak in recognizing coherence, and thus are not reliable in a way to spot the discourse-level improvements of those text generation systems. In this work, we introduce DiscoScore, a parametrized discourse metric, which uses BERT to model discourse coherence from different perspectives, driven by Centering theory. Our experiments encompass 16 non-discourse and discourse metrics, including DiscoScore and popular coherence models, evaluated on summarization and document-level machine translation (MT). We find that (i) the majority of BERT-based metrics correlate much worse with human rated coherence than early discourse metrics, invented a decade ago; (ii) the recent state-of-the-art BARTScore is weak when operated at system level—which is particularly problematic as systems are typically compared in this manner. DiscoScore, in contrast, achieves strong system-level correlation with human ratings, not only in coherence but also in factual consistency and other aspects, and surpasses BARTScore by over 10 correlation points on average. Further, aiming to understand DiscoScore, we provide justifications to the importance of discourse coherence for evaluation metrics, and explain the superiority of one variant over another. Our code is available at https://github.com/AIPHES/DiscoScore.

Investigating Multilingual Coreference Resolution by Universal Annotations
Haixia Chai | Michael Strube
Findings of the Association for Computational Linguistics: EMNLP 2023

Multilingual coreference resolution (MCR) has been a long-standing and challenging task. With the newly proposed multilingual coreference dataset, CorefUD (Nedoluzhko et al., 2022), we conduct an investigation into the task by using its harmonized universal morphosyntactic and coreference annotations. First, we study coreference by examining the ground truth data at different linguistic levels, namely mention, entity and document levels, and across different genres, to gain insights into the characteristics of coreference across multiple languages. Second, we perform an error analysis of the most challenging cases that the SotA system fails to resolve in the CRAC 2022 shared task using the universal annotations. Last, based on this analysis, we extract features from universal morphosyntactic annotations and integrate these features into a baseline system to assess their potential benefits for the MCR task. Our results show that our best configuration of features improves the baseline by 0.9% F1 score.

SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism
Mehwish Fatima | Tim Kolber | Katja Markert | Michael Strube
Proceedings of the 4th New Frontiers in Summarization Workshop

Cross-lingual science journalism is a recently introduced task that generates popular science summaries of scientific articles different from the source language for non-expert readers. A popular science summary must contain salient content of the input document while focusing on coherence and comprehensibility. Meanwhile, generating a cross-lingual summary from the scientific texts in a local language for the targeted audience is challenging. Existing research on cross-lingual science journalism investigates the task with a pipeline model to combine text simplification and cross-lingual summarization. We extend the research in cross-lingual science journalism by introducing a novel, multi-task learning architecture that combines the aforementioned NLP tasks. Our approach is to jointly train the two high-level NLP tasks in SimCSum for generating cross-lingual popular science summaries. We investigate the performance of SimCSum against the pipeline model and several other strong baselines with several evaluation metrics and human evaluation. Overall, SimCSum demonstrates statistically significant improvements over the state-of-the-art on two non-synthetic cross-lingual scientific datasets. Furthermore, we conduct an in-depth investigation into the linguistic properties of generated summaries and an error analysis.

2022

Entity-based Neural Local Coherence Modeling
Sungho Jeon | Michael Strube
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we propose an entity-based neural local coherence model which is linguistically more sound than previously proposed neural coherence models. Recent neural coherence models encode the input document using large-scale pretrained language models. Hence their basis for computing local coherence are words and even sub-words. The analysis of their output shows that these models frequently compute coherence on the basis of connections between (sub-)words which, from a linguistic perspective, should not play a role. Still, these models achieve state-of-the-art performance in several end applications. In contrast to these models, we compute coherence on the basis of entities by constraining the input to noun phrases and proper names. This provides us with an explicit representation of the most important items in sentences leading to the notion of focus. This brings our model linguistically in line with pre-neural models of computing coherence. It also gives us better insight into the behaviour of the model thus leading to better explainability. Our approach is also in accord with a recent study (O’Connor and Andreas, 2021), which shows that most usable information is captured by nouns and verbs in transformer-based language models. We evaluate our model on three downstream tasks showing that it is not only linguistically more sound than previous models but also that it outperforms them in end applications.

Fine-tuning BERT Models for Summarizing German Radiology Findings
Siting Liang | Klaus Kades | Matthias Fink | Peter Full | Tim Weber | Jens Kleesiek | Michael Strube | Klaus Maier-Hein
Proceedings of the 4th Clinical Natural Language Processing Workshop

Writing the conclusion section of radiology reports is essential for communicating the radiology findings and its assessment to physician in a condensed form. In this work, we employ a transformer-based Seq2Seq model for generating the conclusion section of German radiology reports. The model is initialized with the pretrained parameters of a German BERT model and fine-tuned in our downstream task on our domain data. We proposed two strategies to improve the factual correctness of the model. In the first method, next to the abstractive learning objective, we introduce an extraction learning objective to train the decoder in the model to both generate one summary sequence and extract the key findings from the source input. The second approach is to integrate the pointer mechanism into the transformer-based Seq2Seq model. The pointer network helps the Seq2Seq model to choose between generating tokens from the vocabulary or copying parts from the source input during generation. The results of the automatic and human evaluations show that the enhanced Seq2Seq model is capable of generating human-like radiology conclusions and that the improved models effectively reduce the factual errors in the generations despite the small amount of training data.

Proceedings of the 3rd Workshop on Computational Approaches to Discourse
Chloe Braud | Christian Hardmeier | Junyi Jessy Li | Sharid Loaiciga | Michael Strube | Amir Zeldes
Proceedings of the 3rd Workshop on Computational Approaches to Discourse

Proceedings of the CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Juntao Yu | Sopan Khosla | Ramesh Manuvinakurike | Lori Levin | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rose
Proceedings of the CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

The CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Juntao Yu | Sopan Khosla | Ramesh Manuvinakurike | Lori Levin | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé
Proceedings of the CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

The CODI-CRAC 2022 Shared Task on Anaphora Resolution in Dialogues is the second edition of an initiative focused on detecting different types of anaphoric relations in conversations of different kinds. Using five conversational datasets, four of which have been newly annotated with a wide range of anaphoric relations: identity, bridging references and discourse deixis, we defined multiple tasks focusing individually on these key relations. The second edition of the shared task maintained the focus on these relations and used the same datasets as in 2021, but new test data were annotated, the 2021 data were checked, and new subtasks were added. In this paper, we discuss the annotation schemes, the datasets, the evaluation scripts used to assess the system performance on these tasks, and provide a brief summary of the participating systems and the results obtained across 230 runs from three teams, with most submissions achieving significantly better results than our baseline methods.

Evaluating Coreference Resolvers on Community-based Question Answering: From Rule-based to State of the Art
Haixia Chai | Nafise Sadat Moosavi | Iryna Gurevych | Michael Strube
Proceedings of the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference

Coreference resolution is a key step in natural language understanding. Developments in coreference resolution are mainly focused on improving the performance on standard datasets annotated for coreference resolution. However, coreference resolution is an intermediate step for text understanding and it is not clear how these improvements translate into downstream task performance. In this paper, we perform a thorough investigation on the impact of coreference resolvers in multiple settings of community-based question answering task, i.e., answer selection with long answers. Our settings cover multiple text domains and encompass several answer selection methods. We first inspect extrinsic evaluation of coreference resolvers on answer selection by using coreference relations to decontextualize individual sentences of candidate answers, and then annotate a subset of answers with coreference information for intrinsic evaluation. The results of our extrinsic evaluation show that while there is a significant difference between the performance of the rule-based system vs. state-of-the-art neural model on coreference resolution datasets, we do not observe a considerable difference on their impact on downstream models. Our intrinsic evaluation shows that (i) resolving coreference relations on less-formal text genres is more difficult even for trained annotators, and (ii) the values of linguistic-agnostic coreference evaluation metrics do not correlate with the impact on downstream data.

Incorporating Centering Theory into Neural Coreference Resolution
Haixia Chai | Michael Strube
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In recent years, transformer-based coreference resolution systems have achieved remarkable improvements on the CoNLL dataset. However, how coreference resolvers can benefit from discourse coherence is still an open question. In this paper, we propose to incorporate centering transitions derived from centering theory in the form of a graph into a neural coreference model. Our method improves the performance over the SOTA baselines, especially on pronoun resolution in long documents, formal well-structured text, and clusters with scattered mentions.

2021

Proceedings of the 2nd Workshop on Computational Approaches to Discourse
Chloé Braud | Christian Hardmeier | Junyi Jessy Li | Annie Louis | Michael Strube | Amir Zeldes
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Sopan Khosla | Ramesh Manuvinakurike | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Sopan Khosla | Juntao Yu | Ramesh Manuvinakurike | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

In this paper, we provide an overview of the CODI-CRAC 2021 Shared-Task: Anaphora Resolution in Dialogue. The shared task focuses on detecting anaphoric relations in different genres of conversations. Using five conversational datasets, four of which have been newly annotated with a wide range of anaphoric relations: identity, bridging references and discourse deixis, we defined multiple subtasks focusing individually on these key relations. We discuss the evaluation scripts used to assess the system performance on these subtasks, and provide a brief summary of the participating systems and the results obtained across ?? runs from 5 teams, with most submissions achieving significantly better results than our baseline methods.

A Novel Wikipedia based Dataset for Monolingual and Cross-Lingual Summarization
Mehwish Fatima | Michael Strube
Proceedings of the Third Workshop on New Frontiers in Summarization

Cross-lingual summarization is a challenging task for which there are no cross-lingual scientific resources currently available. To overcome the lack of a high-quality resource, we present a new dataset for monolingual and cross-lingual summarization considering the English-German pair. We collect high-quality, real-world cross-lingual data from Spektrum der Wissenschaft, which publishes human-written German scientific summaries of English science articles on various subjects. The generated Spektrum dataset is small; therefore, we harvest a similar dataset from the Wikipedia Science Portal to complement it. The Wikipedia dataset consists of English and German articles, which can be used for monolingual and cross-lingual summarization. Furthermore, we present a quantitative analysis of the datasets and results of empirical experiments with several existing extractive and abstractive summarization models. The results suggest the viability and usefulness of the proposed dataset for monolingual and cross-lingual summarization.

Countering the Influence of Essay Length in Neural Essay Scoring
Sungho Jeon | Michael Strube
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing

Previous work has shown that automated essay scoring systems, in particular machine learning-based systems, are not capable of assessing the quality of essays, but are relying on essay length, a factor irrelevant to writing proficiency. In this work, we first show that state-of-the-art systems, recent neural essay scoring systems, might be also influenced by the correlation between essay length and scores in a standard dataset. In our evaluation, a very simple neural model shows the state-of-the-art performance on the standard dataset. To consider essay content without taking essay length into account, we introduce a simple neural model assessing the similarity of content between an input essay and essays assigned different scores. This neural model achieves performance comparable to the state of the art on a standard dataset as well as on a second dataset. Our findings suggest that neural essay scoring systems should consider the characteristics of datasets to focus on text quality.

2020

Proceedings of the First Workshop on Computational Approaches to Discourse
Chloé Braud | Christian Hardmeier | Junyi Jessy Li | Annie Louis | Michael Strube
Proceedings of the First Workshop on Computational Approaches to Discourse

Evaluation of Coreference Resolution Systems Under Adversarial Attacks
Haixia Chai | Wei Zhao | Steffen Eger | Michael Strube
Proceedings of the First Workshop on Computational Approaches to Discourse

A substantial overlap of coreferent mentions in the CoNLL dataset magnifies the recent progress on coreference resolution. This is because the CoNLL benchmark fails to evaluate the ability of coreference resolvers that requires linking novel mentions unseen at train time. In this work, we create a new dataset based on CoNLL, which largely decreases mention overlaps in the entire dataset and exposes the limitations of published resolvers on two aspects—lexical inference ability and understanding of low-level orthographic noise. Our findings show (1) the requirements for embeddings, used in resolvers, and for coreference resolutions are, by design, in conflict and (2) adversarial approaches are sometimes not legitimate to mitigate the obstacles, as they may falsely introduce mention overlaps in adversarial training and test sets, thus giving an inflated impression for the improvements.

Incremental Neural Lexical Coherence Modeling
Sungho Jeon | Michael Strube
Proceedings of the 28th International Conference on Computational Linguistics

Pretrained language models, neural models pretrained on massive amounts of data, have established the state of the art in a range of NLP tasks. They are based on a modern machine-learning technique, the Transformer which relates all items simultaneously to capture semantic relations in sequences. However, it differs from what humans do. Humans read sentences one-by-one, incrementally. Can neural models benefit by interpreting texts incrementally as humans do? We investigate this question in coherence modeling. We propose a coherence model which interprets sentences incrementally to capture lexical relations between them. We compare the state of the art in each task, simple neural models relying on a pretrained language model, and our model in two downstream tasks. Our findings suggest that interpreting texts incrementally as humans could be useful to design more advanced models.

Centering-based Neural Coherence Modeling with Hierarchical Discourse Segments
Sungho Jeon | Michael Strube
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Previous neural coherence models have focused on identifying semantic relations between adjacent sentences. However, they do not have the means to exploit structural information. In this work, we propose a coherence model which takes discourse structural information into account without relying on human annotations. We approximate a linguistic theory of coherence, Centering theory, which we use to track the changes of focus between discourse segments. Our model first identifies the focus of each sentence, recognized with regards to the context, and constructs the structural relationship for discourse segments by tracking the changes of the focus. The model then incorporates this structural information into a structure-aware transformer. We evaluate our model on two tasks, automated essay scoring and assessing writing quality. Our results demonstrate that our model, built on top of a pretrained language model, achieves state-of-the-art performance on both tasks. We next statistically examine the identified trees of texts assigned to different quality scores. Finally, we investigate what our model learns in terms of theoretical claims.

A Fully Hyperbolic Neural Model for Hierarchical Multi-Class Classification
Federico López | Michael Strube
Findings of the Association for Computational Linguistics: EMNLP 2020

Label inventories for fine-grained entity typing have grown in size and complexity. Nonetheless, they exhibit a hierarchical structure. Hyperbolic spaces offer a mathematically appealing approach for learning hierarchical representations of symbolic data. However, it is not clear how to integrate hyperbolic components into downstream tasks. This is the first work that proposes a fully hyperbolic model for multi-class multi-label classification, which performs all operations in hyperbolic space. We evaluate the proposed model on two challenging datasets and compare to different baselines that operate under Euclidean assumptions. Our hyperbolic model infers the latent hierarchy from the class distribution, captures implicit hyponymic relations in the inventory, and shows performance on par with state-of-the-art methods on fine-grained classification with remarkable reduction of the parameter size. A thorough analysis sheds light on the impact of each component in the final prediction and showcases its ease of integration with Euclidean layers.

A Large Harvested Corpus of Location Metonymy
Kevin Alex Mathews | Michael Strube
Proceedings of the Twelfth Language Resources and Evaluation Conference

Metonymy is a figure of speech in which an entity is referred to by another related entity. The existing datasets of metonymy are either too small in size or lack sufficient coverage. We propose a new, labelled, high-quality corpus of location metonymy called WiMCor, which is large in size and has high coverage. The corpus is harvested semi-automatically from English Wikipedia. We use different labels of varying granularity to annotate the corpus. The corpus can directly be used for training and evaluating automatic metonymy resolution systems. We construct benchmarks for metonymy resolution, and evaluate baseline methods using the new corpus.

Reconstructing Manual Information Extraction with DB-to-Document Backprojection: Experiments in the Life Science Domain
Mark-Christoph Müller | Sucheta Ghosh | Maja Rey | Ulrike Wittig | Wolfgang Müller | Michael Strube
Proceedings of the First Workshop on Scholarly Document Processing

We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task

2019

Adapting Deep Learning Methods for Mental Health Prediction on Social Media
Ivan Sekulic | Michael Strube
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Mental health poses a significant challenge for an individual’s well-being. Text analysis of rich resources, like social media, can contribute to deeper understanding of illnesses and provide means for their early detection. We tackle a challenge of detecting social media users’ mental status through deep learning-based models, moving away from traditional approaches to the task. In a binary classification task on predicting if a user suffers from one of nine different disorders, a hierarchical attention network outperforms previously set benchmarks for four of the disorders. Furthermore, we explore the limitations of our model and analyze phrases relevant for classification by inspecting the model’s word-level attention weights.

On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages
Yi Zhu | Benjamin Heinzerling | Ivan Vulić | Michael Strube | Roi Reichart | Anna Korhonen
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Recent work has validated the importance of subword information for word representation learning. Since subwords increase parameter sharing ability in neural models, their value should be even more pronounced in low-data regimes. In this work, we therefore provide a comprehensive analysis focused on the usefulness of subwords for word representation learning in truly low-resource scenarios and for three representative morphological tasks: fine-grained entity typing, morphological tagging, and named entity recognition. We conduct a systematic study that spans several dimensions of comparison: 1) type of data scarcity which can stem from the lack of task-specific training data, or even from the lack of unannotated data required to train word embeddings, or both; 2) language type by working with a sample of 16 typologically diverse languages including some truly low-resource ones (e.g. Rusyn, Buryat, and Zulu); 3) the choice of the subword-informed word representation method. Our main results show that subword-informed models are universally useful across all language types, with large gains over subword-agnostic embeddings. They also suggest that the effective use of subwords largely depends on the language (type) and the task at hand, as well as on the amount of available data for training the embeddings and task-based models, where having sufficient in-task data is a more critical requirement.

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials
Anoop Sarkar | Michael Strube
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials

Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation
Benjamin Heinzerling | Michael Strube
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Pretrained contextual and non-contextual subword embeddings have become available in over 250 languages, allowing massively multilingual NLP. However, while there is no dearth of pretrained embeddings, the distinct lack of systematic evaluations makes it difficult for practitioners to choose between them. In this work, we conduct an extensive evaluation comparing non-contextual subword embeddings, namely FastText and BPEmb, and a contextual representation method, namely BERT, on multilingual named entity recognition and part-of-speech tagging. We find that overall, a combination of BERT, BPEmb, and character representations works best across languages and tasks. A more detailed analysis reveals different strengths and weaknesses: Multilingual BERT performs well in medium- to high-resource languages, but is outperformed by non-contextual subword embeddings in a low-resource setting.

Using Automatically Extracted Minimum Spans to Disentangle Coreference Evaluation from Boundary Detection
Nafise Sadat Moosavi | Leo Born | Massimo Poesio | Michael Strube
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The common practice in coreference resolution is to identify and evaluate the maximum span of mentions. The use of maximum spans tangles coreference evaluation with the challenges of mention boundary detection like prepositional phrase attachment. To address this problem, minimum spans are manually annotated in smaller corpora. However, this additional annotation is costly and therefore, this solution does not scale to large corpora. In this paper, we propose the MINA algorithm for automatically extracting minimum spans to benefit from minimum span evaluation in all corpora. We show that the extracted minimum spans by MINA are consistent with those that are manually annotated by experts. Our experiments show that using minimum spans is in particular important in cross-dataset coreference evaluation, in which detected mention boundaries are noisier due to domain shift. We have integrated MINA into https://github.com/ns-moosavi/coval for reporting standard coreference scores based on both maximum and automatically detected minimum spans.

Fine-Grained Entity Typing in Hyperbolic Space
Federico López | Benjamin Heinzerling | Michael Strube
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

How can we represent hierarchical information present in large type inventories for entity typing? We study the suitability of hyperbolic embeddings to capture hierarchical relations between mentions in context and their target types in a shared vector space. We evaluate on two datasets and propose two different techniques to extract hierarchical information from the type inventory: from an expert-generated ontology and by automatically mining the dataset. The hyperbolic model shows improvements in some but not all cases over its Euclidean counterpart. Our analysis suggests that the adequacy of this geometry depends on the granularity of the type inventory and the representation of its distribution.

2018

Transparent, Efficient, and Robust Word Embedding Access with WOMBAT
Mark-Christoph Müller | Michael Strube
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.

Using Linguistic Features to Improve the Generalization Capability of Neural Coreference Resolvers
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Coreference resolution is an intermediate step for text understanding. It is used in tasks and domains for which we do not necessarily have coreference annotated corpora. Therefore, generalization is of special importance for coreference resolution. However, while recent coreference resolvers have notable improvements on the CoNLL dataset, they struggle to generalize properly to new domains or datasets. In this paper, we investigate the role of linguistic features in building more generalizable coreference resolvers. We show that generalization improves only slightly by merely using a set of additional linguistic features. However, employing features and subsets of their values that are informative for coreference resolution, considerably improves generalization. Thanks to better generalization, our system achieves state-of-the-art results in out-of-domain evaluations, e.g., on WikiCoref, our system, which is trained on CoNLL, achieves on-par performance with a system designed for this dataset.

A Neural Local Coherence Model for Text Quality Assessment
Mohsen Mesgar | Michael Strube
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose a local coherence model that captures the flow of what semantically connects adjacent sentences in a text. We represent the semantics of a sentence by a vector and capture its state at each word of the sentence. We model what relates two adjacent sentences based on the two most similar semantic states, each of which is in one of the sentences. We encode the perceived coherence of a text by a vector, which represents patterns of changes in salient information that relates adjacent sentences. Our experiments demonstrate that our approach is beneficial for two downstream tasks: Readability assessment, in which our model achieves new state-of-the-art results; and essay scoring, in which the combination of our coherence vectors and other task-dependent features significantly improves the performance of a strong essay scorer.

Unrestricted Bridging Resolution
Yufang Hou | Katja Markert | Michael Strube
Computational Linguistics, Volume 44, Issue 2 - June 2018

In contrast to identity anaphors, which indicate coreference between a noun phrase and its antecedent, bridging anaphors link to their antecedent(s) via lexico-semantic, frame, or encyclopedic relations. Bridging resolution involves recognizing bridging anaphors and finding links to antecedents. In contrast to most prior work, we tackle both problems. Our work also follows a more wide-ranging definition of bridging than most previous work and does not impose any restrictions on the type of bridging anaphora or relations between anaphor and antecedent. We create a corpus (ISNotes) annotated for information status (IS), bridging being one of the IS subcategories. The annotations reach high reliability for all categories and marginal reliability for the bridging subcategory. We use a two-stage statistical global inference method for bridging resolution. Given all mentions in a document, the first stage, bridging anaphora recognition, recognizes bridging anaphors as a subtask of learning fine-grained IS. We use a cascading collective classification method where (i) collective classification allows us to investigate relations among several mentions and autocorrelation among IS classes and (ii) cascaded classification allows us to tackle class imbalance, important for minority classes such as bridging. We show that our method outperforms current methods both for IS recognition overall as well as for bridging, specifically. The second stage, bridging antecedent selection, finds the antecedents for all predicted bridging anaphors. We investigate the phenomenon of semantically or syntactically related bridging anaphors that share the same antecedent, a phenomenon we call sibling anaphors. We show that taking sibling anaphors into account in a joint inference model improves antecedent selection performance. In addition, we develop semantic and salience features for antecedent selection and suggest a novel method to build the candidate antecedent list for an anaphor, using the discourse scope of the anaphor. Our model outperforms previous work significantly.

BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages
Benjamin Heinzerling | Michael Strube
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing
Mark Alfano | Dirk Hovy | Margaret Mitchell | Michael Strube
Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing

2017

Revisiting Selectional Preferences for Coreference Resolution
Benjamin Heinzerling | Nafise Sadat Moosavi | Michael Strube
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Selectional preferences have long been claimed to be essential for coreference resolution. However, they are modeled only implicitly by current coreference resolvers. We propose a dependency-based embedding model of selectional preferences which allows fine-grained compatibility judgments with high coverage. Incorporating our model improves performance, matching state-of-the-art results of a more complex system. However, it comes with a cost that makes it debatable how worthwhile are such improvements.

Trust, but Verify! Better Entity Linking through Automatic Verification
Benjamin Heinzerling | Michael Strube | Chin-Yew Lin
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We introduce automatic verification as a post-processing step for entity linking (EL). The proposed method trusts EL system results collectively, by assuming entity mentions are mostly linked correctly, in order to create a semantic profile of the given text using geospatial and temporal information, as well as fine-grained entity types. This profile is then used to automatically verify each linked mention individually, i.e., to predict whether it has been linked correctly or not. Verification allows leveraging a rich set of global and pairwise features that would be prohibitively expensive for EL systems employing global inference. Evaluation shows consistent improvements across datasets and systems. In particular, when applied to state-of-the-art systems, our method yields an absolute improvement in linking performance of up to 1.7 F1 on AIDA/CoNLL’03 and up to 2.4 F1 on the English TAC KBP 2015 TEDL dataset.

Event Argument Identification on Dependency Graphs with Bidirectional LSTMs
Alex Judea | Michael Strube
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper we investigate the performance of event argument identification. We show that the performance is tied to syntactic complexity. Based on this finding, we propose a novel and effective system for event argument identification. Recurrent Neural Networks learn to produce meaningful representations of long and short dependency paths. Convolutional Neural Networks learn to decompose the lexical context of argument candidates. They are combined into a simple system which outperforms a feature-based, state-of-the-art event argument identifier without any manual feature engineering.

Proceedings of the IJCNLP 2017, Tutorial Abstracts
Sadao Kurohashi | Michael Strube
Proceedings of the IJCNLP 2017, Tutorial Abstracts

Lexical Features in Coreference Resolution: To be Used With Caution
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Lexical features are a major source of information in state-of-the-art coreference resolvers. Lexical features implicitly model some of the linguistic phenomena at a fine granularity level. They are especially useful for representing the context of mentions. In this paper we investigate a drawback of using many lexical features in state-of-the-art coreference resolvers. We show that if coreference resolvers mainly rely on lexical features, they can hardly generalize to unseen domains. Furthermore, we show that the current coreference resolution evaluation is clearly flawed by only evaluating on a specific split of a specific dataset in which there is a notable overlap between the training, development and test sets.

Use Generalized Representations, But Do Not Forget Surface Features
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

Only a year ago, all state-of-the-art coreference resolvers were using an extensive amount of surface features. Recently, there was a paradigm shift towards using word embeddings and deep neural networks, where the use of surface features is very limited. In this paper, we show that a simple SVM model with surface features outperforms more complex neural models for detecting anaphoric mentions. Our analysis suggests that using generalized representations and surface features have different strength that should be both taken into account for improving coreference resolution.

Proceedings of the First ACL Workshop on Ethics in Natural Language Processing
Dirk Hovy | Shannon Spruit | Margaret Mitchell | Emily M. Bender | Michael Strube | Hanna Wallach
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

Using a Graph-based Coherence Model in Document-Level Machine Translation
Leo Born | Mohsen Mesgar | Michael Strube
Proceedings of the Third Workshop on Discourse in Machine Translation

Although coherence is an important aspect of any text generation system, it has received little attention in the context of machine translation (MT) so far. We hypothesize that the quality of document-level translation can be improved if MT models take into account the semantic relations among sentences during translation. We integrate the graph-based coherence model proposed by Mesgar and Strube, (2016) with Docent (Hardmeier et al., 2012, Hardmeier, 2014) a document-level machine translation system. The application of this graph-based coherence modeling approach is novel in the context of machine translation. We evaluate the coherence model and its effects on the quality of the machine translation. The result of our experiments shows that our coherence model slightly improves the quality of translation in terms of the average Meteor score.

2016

Incremental Global Event Extraction
Alex Judea | Michael Strube
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Event extraction is a difficult information extraction task. Li et al. (2014) explore the benefits of modeling event extraction and two related tasks, entity mention and relation extraction, jointly. This joint system achieves state-of-the-art performance in all tasks. However, as a system operating only at the sentence level, it misses valuable information from other parts of the document. In this paper, we present an incremental easy-first approach to make the global context of the entire document available to the intra-sentential, state-of-the-art event extractor. We show that our method robustly increases performance on two datasets, namely ACE 2005 and TAC 2015.

Generating Coherent Summaries of Scientific Articles Using Coherence Patterns
Daraksha Parveen | Mohsen Mesgar | Michael Strube
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Search Space Pruning: A Simple Solution for Better Coreference Resolvers
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Lexical Coherence Graph Modeling Using Word Embeddings
Mohsen Mesgar | Michael Strube
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Feature-Rich Error Detection in Scientific Writing Using Logistic Regression
Madeline Remse | Mohsen Mesgar | Michael Strube
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

Microblog Emotion Classification by Computing Similarity in Text, Time, and Space
Anja Summa | Bernd Resch | Michael Strube
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Most work in NLP analysing microblogs focuses on textual content thus neglecting temporal and spatial information. We present a new interdisciplinary method for emotion classification that combines linguistic, temporal, and spatial information into a single metric. We create a graph of labeled and unlabeled tweets that encodes the relations between neighboring tweets with respect to their emotion labels. Graph-based semi-supervised learning labels all tweets with an emotion.

2015

Topical Coherence for Graph-based Extractive Summarization
Daraksha Parveen | Hans-Martin Ramsl | Michael Strube
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Analyzing and Visualizing Coreference Resolution Errors
Sebastian Martschat | Thierry Göckel | Michael Strube
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Chengqing Zong | Michael Strube
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Chengqing Zong | Michael Strube
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Visual Error Analysis for Entity Linking
Benjamin Heinzerling | Michael Strube
Proceedings of ACL-IJCNLP 2015 System Demonstrations

Plug Latent Structures and Play Coreference Resolution
Sebastian Martschat | Patrick Claus | Michael Strube
Proceedings of ACL-IJCNLP 2015 System Demonstrations

Latent Structures for Coreference Resolution
Sebastian Martschat | Michael Strube
Transactions of the Association for Computational Linguistics, Volume 3

Machine learning approaches to coreference resolution vary greatly in the modeling of the problem: while early approaches operated on the mention pair level, current research focuses on ranking architectures and antecedent trees. We propose a unified representation of different approaches to coreference resolution in terms of the structure they operate on. We represent several coreference resolution approaches proposed in the literature in our framework and evaluate their performance. Finally, we conduct a systematic analysis of the output of these approaches, highlighting differences and similarities.

Event Extraction as Frame-Semantic Parsing
Alex Judea | Michael Strube
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

Graph-based Coherence Modeling For Assessing Readability
Mohsen Mesgar | Michael Strube
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

2014

Unsupervised Coreference Resolution by Utilizing the Most Informative Relations
Nafise Sadat Moosavi | Michael Strube
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

Recall Error Analysis for Coreference Resolution
Sebastian Martschat | Michael Strube
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

A Rule-Based System for Unrestricted Bridging Resolution: Recognizing Bridging Anaphora and Finding Links to Antecedents
Yufang Hou | Katja Markert | Michael Strube
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

A Latent Variable Model for Discourse-aware Concept and Entity Disambiguation
Angela Fahrni | Michael Strube
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation
Sameer Pradhan | Xiaoqiang Luo | Marta Recasens | Eduard Hovy | Vincent Ng | Michael Strube
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Normalized Entity Graph for Computing Local Coherence
Mohsen Mesgar | Michael Strube
Proceedings of TextGraphs-9: the workshop on Graph-based Methods for Natural Language Processing

Multi-document Summarization Using Bipartite Graphs
Daraksha Parveen | Michael Strube
Proceedings of TextGraphs-9: the workshop on Graph-based Methods for Natural Language Processing

2013

Cascading Collective Classification for Bridging Anaphora Recognition using a Rich Linguistic Feature Set
Yufang Hou | Katja Markert | Michael Strube
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

Global Inference for Bridging Anaphora Resolution
Yufang Hou | Katja Markert | Michael Strube
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Graph-based Local Coherence Modeling
Camille Guinaudeau | Michael Strube
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Proceedings of the SIGDIAL 2013 Conference
Maxine Eskenazi | Michael Strube | Barbara Di Eugenio | Jason D. Williams
Proceedings of the SIGDIAL 2013 Conference

2012

Jointly Disambiguating and Clustering Concepts and Entities with Markov Logic
Angela Fahrni | Michael Strube
Proceedings of COLING 2012

Local and Global Context for Supervised and Unsupervised Metonymy Resolution
Vivi Nastase | Alex Judea | Katja Markert | Michael Strube
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Concept-based Selectional Preferences and Distributional Representations from Wikipedia Articles
Alex Judea | Vivi Nastase | Michael Strube
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the derivation of distributional semantic representations for open class words relative to a concept inventory, and of concepts relative to open class words through grammatical relations extracted from Wikipedia articles. The concept inventory comes from WikiNet, a large-scale concept network derived from Wikipedia. The distinctive feature of these representations are their relation to a concept network, through which we can compute selectional preferences of open-class words relative to general concepts. The resource thus derived provides a meaning representation that complements the relational representation captured in the concept network. It covers English open-class words, but the concept base is language independent. The resource can be extended to other languages, with the use of language specific dependency parsers. Good results in metonymy resolution show the resource's potential use for NLP applications.

Collective Classification for Fine-grained Information Status
Katja Markert | Yufang Hou | Michael Strube
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Michael Strube
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

A Multigraph Model for Coreference Resolution
Sebastian Martschat | Jie Cai | Samuel Broscheit | Éva Mújdricza-Maydt | Michael Strube
Joint Conference on EMNLP and CoNLL - Shared Task

2011

Fine-Grained Sentiment Analysis with Structural Features
Cäcilia Zirn | Mathias Niepert | Heiner Stuckenschmidt | Michael Strube
Proceedings of 5th International Joint Conference on Natural Language Processing

WikiNetTK – A Tool Kit for EmbeddingWorld Knowledge in NLP Applications
Alex Judea | Vivi Nastase | Michael Strube
Proceedings of the IJCNLP 2011 System Demonstrations

Unrestricted Coreference Resolution via Global Hypergraph Partitioning
Jie Cai | Éva Mújdricza-Maydt | Michael Strube
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

2010

End-to-End Coreference Resolution via Hypergraph Partitioning
Jie Cai | Michael Strube
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

WikiNet: A Very Large Scale Multi-Lingual Concept Network
Vivi Nastase | Michael Strube | Benjamin Boerschinger | Caecilia Zirn | Anas Elghafari
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes a multi-lingual large-scale concept network obtained automatically by mining for concepts and relations and exploiting a variety of sources of knowledge from Wikipedia. Concepts and their lexicalizations are extracted from Wikipedia pages, in particular from article titles, hyperlinks, disambiguation pages and cross-language links. Relations are extracted from the category and page network, from the category names, from infoboxes and the body of the articles. The resulting network has two main components: (i) a central, language independent index of concepts, which serves to keep track of the concepts' lexicalizations both within a language and across languages, and to separate linguistic expressions of concepts from the relations in which they are involved (concepts themselves are represented as numeric IDs); (ii) a large network built on the basis of the relations extracted, represented as relations between concepts (more specifically, the numeric IDs). The various stages of obtaining the network were separately evaluated, and the results show a qualitative resource.

Evaluation Metrics For End-to-End Coreference Resolution Systems
Jie Cai | Michael Strube
Proceedings of the SIGDIAL 2010 Conference

2009

Combining Collocations, Lexical and Encyclopedic Knowledge for Metonymy Resolution
Vivi Nastase | Michael Strube
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

Tree Linearization in English: Improving Language Model Based Approaches
Katja Filippova | Michael Strube
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

Extracting World and Linguistic Knowledge from Wikipedia
Simone Paolo Ponzetto | Michael Strube
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts

Finding Hedges by Chasing Weasels: Hedge Detection Using Wikipedia Tags and Shallow Linguistic Features
Viola Ganter | Michael Strube
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

Creating an Annotated Corpus for Generating Walking Directions
Stephanie Schuldes | Michael Roth | Anette Frank | Michael Strube
Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009)

2008

Sentence Fusion via Dependency Graph Compression
Katja Filippova | Michael Strube
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

Knowledge Sources for Bridging Resolution in Multi-Party Dialog
Mark-Christoph Mueller | Margot Mieskes | Michael Strube
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we investigate the coverage of the two knowledge sources WordNet and Wikipedia for the task of bridging resolution. We report on an annotation experiment which yielded pairs of bridging anaphors and their antecedents in spoken multi-party dialog. Manual inspection of the two knowledge sources showed that, with some interesting exceptions, Wikipedia is superior to WordNet when it comes to the coverage of information necessary to resolve the bridging anaphors in our data set. We further describe a simple procedure for the automatic extraction of the required knowledge from Wikipedia by means of an API, and discuss some of the implications of the procedures performance.

A Three-stage Disfluency Classifier for Multi Party Dialogues
Margot Mieskes | Michael Strube
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present work on a three-stage system to detect and classify disfluencies in multi party dialogues. The system consists of a regular expression based module and two machine learning based modules. The results are compared to other work on multi party dialogues and we show that our system outperforms previously reported ones.

Acquiring a Taxonomy from the German Wikipedia
Laura Kassner | Vivi Nastase | Michael Strube
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents the process of acquiring a large, domain independent, taxonomy from the German Wikipedia. We build upon a previously implemented platform that extracts a semantic network and taxonomy from the English version of the Wikipedia. We describe two accomplishments of our work: the semantic network for the German language in which isa links are identified and annotated, and an expansion of the platform for easy adaptation for a new language. We identify the platforms strengths and shortcomings, which stem from the scarcity of free processing resources for languages other than English. We show that the taxonomy induction process is highly reliable - evaluated against the German version of WordNet, GermaNet, the resource obtained shows an accuracy of 83.34%.

Parameters for Topic Boundary Detection in Multi-Party Dialogues
Margot Mieskes | Michael Strube
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present a topic boundary detection method that searches for connections between sequences of utterances in multi party dialogues. The connections are established based on word identity. We compare our method to a state-of-the art automatic Topic boundary detection method that was also used on multi party dialogues. We checked various methods of preprocessing of the data, including stemming, lemmatization and stopword filtering with a text-based as well as speech-based stopword lists. Using standard evaluation methods we found that our method outperformed the state-of-the art method.

Dependency Tree Based Sentence Compression
Katja Filippova | Michael Strube
Proceedings of the Fifth International Natural Language Generation Conference

2007

Generating Constituent Order in German Clauses
Katja Filippova | Michael Strube
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

An API for Measuring the Relatedness of Words in Wikipedia
Simone Paolo Ponzetto | Michael Strube
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

Extending the Entity-grid Coherence Model to Semantically Related Entities
Katja Filippova | Michael Strube
Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)

2006

Semantic Role Labeling for Coreference Resolution
Simone Paolo Ponzetto | Michael Strube
Demonstrations

Part-of-Speech Tagging of Transcribed Speech
Margot Mieskes | Michael Strube
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We used four Part-of-Speech taggers, which are available for research purposes and were originally trained on text to tag a corpus of transcribed multiparty spoken dialogues. The assigned tags were then manually corrected. The correction was first used to evaluate the four taggers, then to retrain them. Despite limited resources in time, money and annotators we reached results comparable to those reported for the taggers on text. Based on our experience we present guidelines to produce reliably POS tagged corpora of new domains.

Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution
Simone Paolo Ponzetto | Michael Strube
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

Using linguistically motivated features for paragraph boundary identification
Katja Filippova | Michael Strube
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

Beyond the Pipeline: Discrete Optimization in NLP
Tomasz Marciniak | Michael Strube
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

Semantic Role Labeling Using Lexical Statistical Information
Simone Paolo Ponzetto | Michael Strube
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

Discrete Optimization as an Alternative to Sequential Processing in NLG
Tomasz Marciniak | Michael Strube
Proceedings of the Tenth European Workshop on Natural Language Generation (ENLG-05)

2004

Semantic Similarity Applied to Spoken Dialogue Summarization
Iryna Gurevych | Michael Strube
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

Book Reviews: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms by Thorsten Joachims; Anaphora Resolution by Ruslan Mitkov
Roberto Basili | Michael Strube
Computational Linguistics, Volume 29, Number 4, December 2003

A Machine Learning Approach to Pronoun Resolution in Spoken Dialogue
Michael Strube | Christoph Müller
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

Multi-Level Annotation in MMAX
Christoph Müller | Michael Strube
Proceedings of the Fourth SIGdial Workshop of Discourse and Dialogue

2002

An Iterative Data Collection Approach for Multimodal Dialogue Systems
Stefan Rapp | Michael Strube
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

An API for Discourse-level Access to XML-encoded Corpora
Christoph Müller | Michael Strube
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

Applying Co-Training to Reference Resolution
Christoph Mueller | Stefan Rapp | Michael Strube
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

Annotating the Semantic Consistency of Speech Recognition Hypotheses
Iryna Gurevych | Robert Porzel | Michael Strube
Proceedings of the Third SIGdial Workshop on Discourse and Dialogue

The Influence of Minimum Edit Distance on Reference Resolution
Michael Strube | Stefan Rapp | Christoph Müller
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

2001

Annotating Anaphoric and Bridging Relations with MMAX
Christoph Mueller | Michael Strube
Proceedings of the Second SIGdial Workshop on Discourse and Dialogue

2000

A Probabilistic Genre-Independent Model of Pronominalization
Michael Strube | Maria Wolters
1st Meeting of the North American Chapter of the Association for Computational Linguistics

1999

Resolving Discourse Deictic Anaphora in Dialogues
Miriam Eckert | Michael Strube
Ninth Conference of the European Chapter of the Association for Computational Linguistics

Functional Centering – Grounding Referential Coherence on Information Structure
Michael Strube | Udo Hahn
Computational Linguistics, Volume 25, Number 3, September 1999

Building a Tool for Annotating Reference in Discourse
Jonathan DeCristofaro | Michael Strube | Kathleen E. McCoy
The Relation of Discourse/Dialogue Structure and Reference

Generating Anaphoric Expressions: Pronoun or Definite Description?
Kathleen E. McCoy | Michael Strube
The Relation of Discourse/Dialogue Structure and Reference

1998

Never Look Back: An Alternative to Centering
Michael Strube
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

Never Look Back: An Alternative to Centering
Michael Strube
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

1997

Centering in-the-Large: Computing Referential Discourse Segments
Udo Hahn | Michael Strube
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1996

Bridging Textual Ellipses
Udo Hahn | Michael Strube | Katja Markert
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

Functional Centering
Michael Strube | Udo Hahn
34th Annual Meeting of the Association for Computational Linguistics

Processing Complex Sentences in the Centering Framework
Michael Strube
34th Annual Meeting of the Association for Computational Linguistics

1995

ParseTalk about Sentence- and Text-Level Anaphora
Michael Strube | Udo Hahn
Seventh Conference of the European Chapter of the Association for Computational Linguistics

Co-authors

Katja Filippova 6

Christian Hardmeier 6

Junyi Jessy Li 6

Christoph Müller 6

Sebastian Martschat 5

Massimo Poesio 5

Simone Paolo Ponzetto 5

Sharid Loáiciga 4

Ramesh Manuvinakurike 4

Margot Mieskes 4

Mehwish Fatima 3

Iryna Gurevych 3

Mark-Christoph Müller 3

Daraksha Parveen 3

Angela Fahrni 2

Federico López 2

Tomasz Marciniak 2

Kathleen F. McCoy 2

Margaret Mitchell 2

Éva Mújdricza-Maydt 2

Cäcilia Zirn 2

Chengqing Zong 2

Kevin Alex Mathews 1

Souvik Banerjee 1

Roberto Basili 1

Emily M. Bender 1

Samuel Broscheit 1

Benjamin Börschinger 1

Patrick Claus 1

Jonathan DeCristofaro 1

Barbara Di Eugenio 1

Miriam Eckert 1

Anas Elghafari 1

Maxine Eskenazi 1

Matthias Fink 1

Sucheta Ghosh 1

Camille Guinaudeau 1

Thierry Göckel 1

Laura Kassner 1

Jens Kleesiek 1

Anna Korhonen 1

Sadao Kurohashi 1

Xiaoqiang Luo 1

Klaus Maier-Hein 1

Wolfgang Müller 1

Mathias Niepert 1

Robert Porzel 1

Sameer Pradhan 1

Hans-Martin Ramsl 1

Marta Recasens 1

Madeline Remse 1

Stephanie Schuldes 1

Ivan Sekulić 1

Shannon L. Spruit 1

Heiner Stuckenschmidt 1

Hanna Wallach 1

Jason D. Williams 1

Ulrike Wittig 1

Maria Wolters 1

Venues