Dominik Schlechtweg

2025

CoMeDi Shared Task: Median Judgment Classification & Mean Disagreement Ranking with Ordinal Word-in-Context Judgments
Dominik Schlechtweg | Tejaswi Choppa | Wei Zhao | Michael Roth
Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation

We asked task participants to solve two subtasks given a pair of word usages: Ordinal Graded Word-in-Context Classification (OGWiC) and Disagreement in Word-in-Context Ranking (DisWiC). The tasks take a different view on modeling of word meaning by (i) treating WiC as an ordinal classification task, and (ii) making disagreement the explicit detection aim (instead of removing it). OGWiC is solved with relatively high performance while DisWiC proves to be a challenging task. In both tasks, the dominating model architecture uses independently optimized binary Word-in-Context models.

pdf bib abs

Predicting Median, Disagreement and Noise Label in Ordinal Word-in-Context Data
Tejaswi Choppa | Michael Roth | Dominik Schlechtweg
Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation

The quality of annotated data is crucial for Machine Learning models, particularly in word sense annotation in context (Word-in-Context, WiC). WiC datasets often show significant annotator disagreement, and information is lost when creating gold labels through majority or median aggregation. Recent work has addressed this by incorporating disagreement data through new label aggregation methods. Modeling disagreement is important since real-world scenarios often lack clean data and require predictions on inherently difficult samples. Disagreement prediction can help detect complex cases or to reflect inherent data ambiguity. We aim to model different aspects of ordinal Word-in-Context annotations necessary to build a more human-like model: (i) the aggregated label, which has traditionally been the modeling aim, (ii) the disagreement between annotators, and (iii) the aggregated noise label which annotators can choose to exclude data points from annotation. We find that disagreement and noise are impacted by various properties of data like ambiguity, which in turn points to data uncertainty.

pdf bib

Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation
Michael Roth | Dominik Schlechtweg
Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation

pdf bib abs

XL-DURel: Finetuning Sentence Transformers for Ordinal Word-in-Context Classification
Sachin Yadav | Dominik Schlechtweg
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

We propose XL-DURel, a finetuned, multilingual Sentence Transformer model optimized for ordinal Word-in-Context classification. We test several loss functions for regression and ranking tasks managing to outperform previous models on ordinal and binary data with a ranking objective based on angular distance in complex space. We further show that binary WiC can be treated as a special case of ordinal WiC and that optimizing models for the general ordinal task improves performance on the more specific binary task. This paves the way for a unified treatment of WiC modeling across different task formulations.

pdf bib abs

ABDN-NLP at CoMeDi Shared Task: Predicting the Aggregated Human Judgment via Weighted Few-Shot Prompting
Ying Xuan Loke | Dominik Schlechtweg | Wei Zhao
Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation

Human annotation is notorious for being subjective and expensive. Recently, (CITATION) introduced the CoMeDi shared task aiming to address this issue by predicting human annotations on the semantic proximity between word uses, and estimating the variation of the human annotations. However, distinguishing the proximity between word uses can be challenging, when their semantic difference is subtle. In this work, we focus on predicting the aggregated annotator judgment of semantic proximity by using a large language model fine-tuned on 20 examples with various proximity classes. To distinguish nuanced proximity, we propose a weighted few-shot approach that pays greater attention to the proximity classes identified as important during fine-tuning. We evaluate our approach in the CoMeDi shared task across 7 languages. Our results demonstrate the superiority of our approach over zero-shot and standard few-shot counterparts. While useful, the weighted few-shot should be applied with caution, given that it relies on development sets to compute the importance of proximity classes, and thus may not generalize well to real-world scenarios where the distribution of class importance is different.

2024

pdf bib abs

TRoTR: A Framework for Evaluating the Re-contextualization of Text Reuse
Francesco Periti | Pierluigi Cassotti | Stefano Montanelli | Nina Tahmasebi | Dominik Schlechtweg
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Current approaches for detecting text reuse do not focus on recontextualization, i.e., how the new context(s) of a reused text differs from its original context(s). In this paper, we propose a novel framework called TRoTR that relies on the notion of topic relatedness for evaluating the diachronic change of context in which text is reused. TRoTR includes two NLP tasks: TRiC and TRaC. TRiC is designed to evaluate the topic relatedness between a pair of recontextualizations. TRaC is designed to evaluate the overall topic variation within a set of recontextualizations. We also provide a curated TRoTR benchmark of biblical text reuse, human-annotated with topic relatedness. The benchmark exhibits an inter-annotator agreement of .811. We evaluate multiple, established SBERT models on the TRoTR tasks and find that they exhibit greater sensitivity to textual similarity than topic relatedness. Our experiments show that fine-tuning these models can mitigate such a kind of sensitivity.

pdf bib abs

More DWUGs: Extending and Evaluating Word Usage Graph Datasets in Multiple Languages
Dominik Schlechtweg | Pierluigi Cassotti | Bill Noble | David Alfter | Sabine Schulte Im Walde | Nina Tahmasebi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Word Usage Graphs (WUGs) represent human semantic proximity judgments for pairs of word uses in a weighted graph, which can be clustered to infer word sense clusters from simple pairwise word use judgments, avoiding the need for word sense definitions. SemEval-2020 Task 1 provided the first and to date largest manually annotated, diachronic WUG dataset. In this paper, we check the robustness and correctness of the annotations by continuing the SemEval annotation algorithm for two more rounds and comparing against an established annotation paradigm. Further, we test the reproducibility by resampling a new, smaller set of word uses from the SemEval source corpora and annotating them. Our work contributes to a better understanding of the problems and opportunities of the WUG annotation paradigm and points to future improvements.

pdf bib abs

Enriching Word Usage Graphs with Cluster Definitions
Andrey Kutuzov | Mariia Fedorova | Dominik Schlechtweg | Nikolay Arefyev
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions. They are generated from scratch by fine-tuned encoder-decoder language models. The conducted human evaluation has shown that these definitions match the existing clusters in WUGs better than the definitions chosen from WordNet by two baseline systems. At the same time, the method is straightforward to use and easy to extend to new languages. The resulting enriched datasets can be extremely helpful for moving on to explainable semantic change modeling.

pdf bib

Presence or Absence: Are Unknown Word Usages in Dictionaries?
Xianghe Ma | Dominik Schlechtweg | Wei Zhao
Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change

pdf bib abs

The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change
Dominik Schlechtweg | Shafqat Mumtaz Virk | Pauline Sander | Emma Sköldberg | Lukas Theuer Linke | Tuo Zhang | Nina Tahmasebi | Jonas Kuhn | Sabine Schulte Im Walde
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

We present the DURel tool implementing the annotation of semantic proximity between word uses into an online, open source interface. The tool supports standardized human annotation as well as computational annotation, building on recent advances with Word-in-Context models. Annotator judgments are clustered with automatic graph clustering techniques and visualized for analysis. This allows to measure word senses with simple and intuitive micro-task judgments between use pairs, requiring minimal preparation efforts. The tool offers additional functionalities to compare the agreement between annotators to guarantee the inter-subjectivity of the obtained judgments and to calculate summary statistics over the annotated data giving insights into sense frequency distributions, semantic variation or changes of senses over time.

2023

pdf bib abs

ChiWUG: A Graph-based Evaluation Dataset for Chinese Lexical Semantic Change Detection
Jing Chen | Emmanuele Chersoni | Dominik Schlechtweg | Jelena Prokic | Chu-Ren Huang
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

Recent studies suggested that language models are efficient tools for measuring lexical semantic change. In our paper, we present the compilation of the first graph-based evaluation dataset for lexical semantic change in the context of the Chinese language, specifically covering the periods of pre- and post- Reform and Opening Up. Exploiting the existing framework DURel, we collect over 61,000 human semantic relatedness judgments for 40 targets. The inferred word usage graphs and semantic change scores provide a basis for visualization and evaluation of semantic change.

2022

pdf bib abs

LSCDiscovery: A shared task on semantic change discovery and detection in Spanish
Frank D. Zamora-Reina | Felipe Bravo-Marquez | Dominik Schlechtweg
Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change

We present the first shared task on semantic change discovery and detection in Spanish. We create the first dataset of Spanish words manually annotated by semantic change using the DURel framewok (Schlechtweg et al., 2018). The task is divided in two phases: 1) graded change discovery, and 2) binary change detection. In addition to introducing a new language for this task, the main novelty with respect to the previous tasks consists in predicting and evaluating changes for all vocabulary words in the corpus. Six teams participated in phase 1 and seven teams in phase 2 of the shared task, and the best system obtained a Spearman rank correlation of 0.735 for phase 1 and an F1 score of 0.735 for phase 2. We describe the systems developed by the competing teams, highlighting the techniques that were particularly useful.

pdf bib abs

DiaWUG: A Dataset for Diatopic Lexical Semantic Variation in Spanish
Gioia Baldissin | Dominik Schlechtweg | Sabine Schulte im Walde
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We provide a novel dataset – DiaWUG – with judgements on diatopic lexical semantic variation for six Spanish variants in Europe and Latin America. In contrast to most previous meaning-based resources and studies on semantic diatopic variation, we collect annotations on semantic relatedness for Spanish target words in their contexts from both a semasiological perspective (i.e., exploring the meanings of a word given its form, thus including polysemy) and an onomasiological perspective (i.e., exploring identical meanings of words with different forms, thus including synonymy). In addition, our novel dataset exploits and extends the existing framework DURel for annotating word senses in context (Erk et al., 2013; Schlechtweg et al., 2018) and the framework-embedded Word Usage Graphs (WUGs) – which up to now have mainly be used for semasiological tasks and resources – in order to distinguish, visualize and interpret lexical semantic variation of contextualized words in Spanish from these two perspectives, i.e., semasiological and onomasiological language variation.

2021

pdf bib

More than just Frequency? Demasking Unsupervised Hypernymy Prediction Methods
Thomas Bott | Dominik Schlechtweg | Sabine Schulte im Walde
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs

Regression Analysis of Lexical and Morpho-Syntactic Properties of Kiezdeutsch
Diego Frassinelli | Gabriella Lapesa | Reem Alatrash | Dominik Schlechtweg | Sabine Schulte im Walde
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects

Kiezdeutsch is a variety of German predominantly spoken by teenagers from multi-ethnic urban neighborhoods in casual conversations with their peers. In recent years, the popularity of Kiezdeutsch has increased among young people, independently of their socio-economic origin, and has spread in social media, too. While previous studies have extensively investigated this language variety from a linguistic and qualitative perspective, not much has been done from a quantitative point of view. We perform the first large-scale data-driven analysis of the lexical and morpho-syntactic properties of Kiezdeutsch in comparison with standard German. At the level of results, we confirm predictions of previous qualitative analyses and integrate them with further observations on specific linguistic phenomena such as slang and self-centered speaker attitude. At the methodological level, we provide logistic regression as a framework to perform bottom-up feature selection in order to quantify differences across language varieties.

pdf bib abs

Lexical Semantic Change Discovery
Sinan Kurtyigit | Maike Park | Dominik Schlechtweg | Jonas Kuhn | Sabine Schulte im Walde
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While there is a large amount of research in the field of Lexical Semantic Change Detection, only few approaches go beyond a standard benchmark evaluation of existing models. In this paper, we propose a shift of focus from change detection to change discovery, i.e., discovering novel word senses over time from the full corpus vocabulary. By heavily fine-tuning a type-based and a token-based approach on recently published German data, we demonstrate that both models can successfully be applied to discover new words undergoing meaning change. Furthermore, we provide an almost fully automated framework for both evaluation and discovery.

pdf bib abs

DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages
Dominik Schlechtweg | Nina Tahmasebi | Simon Hengchen | Haim Dubossarsky | Barbara McGillivray
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We describe in detail the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible – diachronic and synchronic – uses for this dataset.

pdf bib abs

Modeling Sense Structure in Word Usage Graphs with the Weighted Stochastic Block Model
Dominik Schlechtweg | Enrique Castaneda | Jonas Kuhn | Sabine Schulte im Walde
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

We suggest to model human-annotated Word Usage Graphs capturing fine-grained semantic proximity distinctions between word uses with a Bayesian formulation of the Weighted Stochastic Block Model, a generative model for random graphs popular in biology, physics and social sciences. By providing a probabilistic model of graded word meaning we aim to approach the slippery and yet widely used notion of word sense in a novel way. The proposed framework enables us to rigorously compare models of word senses with respect to their fit to the data. We perform extensive experiments and select the empirically most adequate model.

pdf bib abs

Effects of Pre- and Post-Processing on type-based Embeddings in Lexical Semantic Change Detection
Jens Kaiser | Sinan Kurtyigit | Serge Kotchourko | Dominik Schlechtweg
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Lexical semantic change detection is a new and innovative research field. The optimal fine-tuning of models including pre- and post-processing is largely unclear. We optimize existing models by (i) pre-training on large corpora and refining on diachronic target corpora tackling the notorious small data problem, and (ii) applying post-processing transformations that have been shown to improve performance on synchronic tasks. Our results provide a guide for the application and optimization of lexical semantic change detection models across various learning scenarios.

pdf bib abs

Explaining and Improving BERT Performance on Lexical Semantic Change Detection
Severin Laicher | Sinan Kurtyigit | Dominik Schlechtweg | Jonas Kuhn | Sabine Schulte im Walde
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Type- and token-based embedding architectures are still competing in lexical semantic change detection. The recent success of type-based models in SemEval-2020 Task 1 has raised the question why the success of token-based models on a variety of other NLP tasks does not translate to our field. We investigate the influence of a range of variables on clusterings of BERT vectors and show that its low performance is largely due to orthographic information on the target word, which is encoded even in the higher layers of BERT representations. By reducing the influence of orthography we considerably improve BERT’s performance.

2020

pdf bib abs

IMS at SemEval-2020 Task 1: How Low Can You Go? Dimensionality in Lexical Semantic Change Detection
Jens Kaiser | Dominik Schlechtweg | Sean Papay | Sabine Schulte im Walde
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We present the results of our system for SemEval-2020 Task 1 that exploits a commonly used lexical semantic change detection model based on Skip-Gram with Negative Sampling. Our system focuses on Vector Initialization (VI) alignment, compares VI to the currently top-ranking models for Subtask 2 and demonstrates that these can be outperformed if we optimize VI dimensionality. We demonstrate that differences in performance can largely be attributed to model-specific sources of noise, and we reveal a strong relationship between dimensionality and frequency-induced noise in VI alignment. Our results suggest that lexical semantic change models integrating vector space alignment should pay more attention to the role of the dimensionality parameter.

pdf bib abs

SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
Dominik Schlechtweg | Barbara McGillivray | Simon Hengchen | Haim Dubossarsky | Nina Tahmasebi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Lexical Semantic Change detection, i.e., the task of identifying words that change meaning over time, is a very active research area, with applications in NLP, lexicography, and linguistics. Evaluation is currently the most pressing problem in Lexical Semantic Change detection, as no gold standards are available to the community, which hinders progress. We present the results of the first shared task that addresses this gap by providing researchers with an evaluation framework and manually annotated, high-quality datasets for English, German, Latin, and Swedish. 33 teams submitted 186 systems, which were evaluated on two subtasks.

pdf bib abs

CCOHA: Clean Corpus of Historical American English
Reem Alatrash | Dominik Schlechtweg | Jonas Kuhn | Sabine Schulte im Walde
Proceedings of the Twelfth Language Resources and Evaluation Conference

Modelling language change is an increasingly important area of interest within the fields of sociolinguistics and historical linguistics. In recent years, there has been a growing number of publications whose main concern is studying changes that have occurred within the past centuries. The Corpus of Historical American English (COHA) is one of the most commonly used large corpora in diachronic studies in English. This paper describes methods applied to the downloadable version of the COHA corpus in order to overcome its main limitations, such as inconsistent lemmas and malformed tokens, without compromising its qualitative and distributional properties. The resulting corpus CCOHA contains a larger number of cleaned word tokens which can offer better insights into language change and allow for a larger variety of tasks to be performed.

pdf bib abs

Predicting Degrees of Technicality in Automatic Terminology Extraction
Anna Hätty | Dominik Schlechtweg | Michael Dorna | Sabine Schulte im Walde
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

While automatic term extraction is a well-researched area, computational approaches to distinguish between degrees of technicality are still understudied. We semi-automatically create a German gold standard of technicality across four domains, and illustrate the impact of a web-crawled general-language corpus on technicality prediction. When defining a classification approach that combines general-language and domain-specific word embeddings, we go beyond previous work and align vector spaces to gain comparative embeddings. We suggest two novel models to exploit general- vs. domain-specific comparisons: a simple neural network model with pre-computed comparative-embedding information as input, and a multi-channel model computing the comparison internally. Both models outperform previous approaches, with the multi-channel model performing best.

2019

pdf bib abs

Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling
Dominik Schlechtweg | Cennet Oguz | Sabine Schulte im Walde
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

We simulate first- and second-order context overlap and show that Skip-Gram with Negative Sampling is similar to Singular Value Decomposition in capturing second-order co-occurrence information, while Pointwise Mutual Information is agnostic to it. We support the results with an empirical study finding that the models react differently when provided with additional second-order information. Our findings reveal a basic property of Skip-Gram with Negative Sampling and point towards an explanation of its success on a variety of tasks.

pdf bib abs

A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains
Dominik Schlechtweg | Anna Hätty | Marco Del Tredici | Sabine Schulte im Walde
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We perform an interdisciplinary large-scale evaluation for detecting lexical semantic divergences in a diachronic and in a synchronic task: semantic sense changes across time, and semantic sense changes across domains. Our work addresses the superficialness and lack of comparison in assessing models of diachronic lexical change, by bringing together and extending benchmark models on a common state-of-the-art evaluation task. In addition, we demonstrate that the same evaluation task and modelling approaches can successfully be utilised for the synchronic detection of domain-specific sense divergences in the field of term extraction.

pdf bib abs

SURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction
Anna Hätty | Dominik Schlechtweg | Sabine Schulte im Walde
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

We introduce SURel, a novel dataset with human-annotated meaning shifts between general-language and domain-specific contexts. We show that meaning shifts of term candidates cause errors in term extraction, and demonstrate that the SURel annotation reflects these errors. Furthermore, we illustrate that SURel enables us to assess optimisations of term extraction techniques when incorporating meaning shifts.

pdf bib abs

Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change
Haim Dubossarsky | Simon Hengchen | Nina Tahmasebi | Dominik Schlechtweg
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

State-of-the-art models of lexical semantic change detection suffer from noise stemming from vector space alignment. We have empirically tested the Temporal Referencing method for lexical semantic change and show that, by avoiding alignment, it is less affected by this noise. We show that, trained on a diachronic corpus, the skip-gram with negative sampling architecture with temporal referencing outperforms alignment models on a synthetic task as well as a manual testset. We introduce a principled way to simulate lexical semantic change and systematically control for possible biases.

2018

pdf bib abs

Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change
Dominik Schlechtweg | Sabine Schulte im Walde | Stefanie Eckmann
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We propose a framework that extends synchronic polysemy annotation to diachronic changes in lexical meaning, to counteract the lack of resources for evaluating computational models of lexical semantic change. Our framework exploits an intuitive notion of semantic relatedness, and distinguishes between innovative and reductive meaning changes with high inter-annotator agreement. The resulting test set for German comprises ratings from five annotators for the relatedness of 1,320 use pairs across 22 target words.

2017

pdf bib abs

German in Flux: Detecting Metaphoric Change via Word Entropy
Dominik Schlechtweg | Stefanie Eckmann | Enrico Santus | Sabine Schulte im Walde | Daniel Hole
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper explores the information-theoretic measure entropy to detect metaphoric change, transferring ideas from hypernym detection to research on language change. We build the first diachronic test set for German as a standard for metaphoric change annotation. Our model is unsupervised, language-independent and generalizable to other processes of semantic change.

pdf bib abs

Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection
Vered Shwartz | Enrico Santus | Dominik Schlechtweg
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The fundamental role of hypernymy in NLP has motivated the development of many methods for the automatic identification of this relation, most of which rely on word distribution. We investigate an extensive number of such unsupervised measures, using several distributional semantic models that differ by context type and feature weighting. We analyze the performance of the different methods based on their linguistic motivation. Comparison to the state-of-the-art supervised methods shows that while supervised methods generally outperform the unsupervised ones, the former are sensitive to the distribution of training instances, hurting their reliability. Being based on general linguistic hypotheses and independent from training data, unsupervised measures are more robust, and therefore are still useful artillery for hypernymy detection.

2016

pdf bib abs

Exploitation of Co-reference in Distributional Semantics
Dominik Schlechtweg
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The aim of distributional semantics is to model the similarity of the meaning of words via the words they occur with. Thereby, it relies on the distributional hypothesis implying that similar words have similar contexts. Deducing meaning from the distribution of words is interesting as it can be done automatically on large amounts of freely available raw text. It is because of this convenience that most current state-of-the-art-models of distributional semantics operate on raw text, although there have been successful attempts to integrate other kinds of―e.g., syntactic―information to improve distributional semantic models. In contrast, less attention has been paid to semantic information in the research community. One reason for this is that the extraction of semantic information from raw text is a complex, elaborate matter and in great parts not yet satisfyingly solved. Recently, however, there have been successful attempts to integrate a certain kind of semantic information, i.e., co-reference. Two basically different kinds of information contributed by co-reference with respect to the distribution of words will be identified. We will then focus on one of these and examine its general potential to improve distributional semantic models as well as certain more specific hypotheses.

Co-authors

Venues