Adam Roussel


2024

pdf bib
Tabular JSON: A Proposal for a Pragmatic Linguistic Data Format
Adam Roussel
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)

pdf bib
Guidelines for the Annotation of Intentional Linguistic Metaphor
Stefanie Dipper | Adam Roussel | Alexandra Wiemann | Won Kim | Tra-my Nguyen
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024)

This paper presents guidelines for the annotation of intentional (i.e. non-conventionalized) linguistic metaphors. Expressions that contribute to the same metaphorical image are annotated as a chain, additionally a semantically contrasting expression of the target domain is marked as an anchor. So far, a corpus of ten TEDx talks with a total of 20k tokens has been annotated according to these guidelines. 1.25% of the tokens are intentional metaphorical expressions.

pdf bib
Adapting Measures of Literality for Use with Historical Language Data
Adam Roussel
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities

This paper concerns the adaptation of two existing computational measures relating to the estimation of the literality of expressions to enable their use in scenarios where data is scarce, as is usually the case with historical language data. Being able to determine an expression’s literality via statistical means could support a range of linguistic annotation tasks, such as those relating to metaphor, metonymy, and idiomatic expressions, however making this judgment is especially difficult for modern annotators of historical and ancient texts. Therefore we re-implement these measures using smaller corpora and count-based vectors more suited to these amounts of training data. The adapted measures are evaluated against an existing data set of particle verbs annotated with degrees of literality. The results were inconclusive, yielding low correlations between 0.05 and 0.10 (Spearman’s ρ). Further work is needed to determine which measures and types of data correspond to which aspects of literality.

2023

pdf bib
Lexical Semantics with Vector Symbolic Architectures
Adam Roussel
Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)

Conventional approaches to the construction of word vectors typically require very large amounts of unstructured text and powerful computing hardware, and the vectors themselves are also difficult if not impossible to inspect or interpret on their own. In this paper, we introduce a method for building word vectors using the framework of vector symbolic architectures in order to encode the semantic information in wordnets, such as the Open English WordNet or the Open Multilingual Wordnet. Such vectors perform surprisingly well on common word similarity benchmarks, and yet they are transparent, interpretable, and the information contained within them has a clear provenance.

2018

pdf bib
Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey
Varada Kolhatkar | Adam Roussel | Stefanie Dipper | Heike Zinsmeister
Computational Linguistics, Volume 44, Issue 3 - September 2018

This article provides an extensive overview of the literature related to the phenomenon of non-nominal-antecedent anaphora (also known as abstract anaphora or discourse deixis), a type of anaphora in which an anaphor like “that” refers to an antecedent (marked in boldface) that is syntactically non-nominal, such as the first sentence in “It’s way too hot here. That’s why I’m moving to Alaska.” Annotating and automatically resolving these cases of anaphora is interesting in its own right because of the complexities involved in identifying non-nominal antecedents, which typically represent abstract objects such as events, facts, and propositions. There is also practical value in the resolution of non-nominal-antecedent anaphora, as this would help computational systems in machine translation, summarization, and question answering, as well as, conceivably, any other task dependent on some measure of text understanding. Most of the existing approaches to anaphora annotation and resolution focus on nominal-antecedent anaphora, classifying many of the cases where the antecedents are syntactically non-nominal as non-anaphoric. There has been some work done on this topic, but it remains scattered and difficult to collect and assess. With this article, we hope to bring together and synthesize work done in disparate contexts up to now in order to identify fundamental problems and draw conclusions from an overarching perspective. Having a good picture of the current state of the art in this field can help researchers direct their efforts to where they are most necessary. Because of the great variety of theoretical approaches that have been brought to bear on the problem, there is an equally diverse array of terminologies that are used to describe it, so we will provide an overview and discussion of these terminologies. We also describe the linguistic properties of non-nominal-antecedent anaphora, examine previous annotation efforts that have addressed this topic, and present the computational approaches that aim at resolving non-nominal-antecedent anaphora automatically. We close with a review of the remaining open questions in this area and some of our recommendations for future research.

pdf bib
Anaphora Resolution with the ARRAU Corpus
Massimo Poesio | Yulia Grishina | Varada Kolhatkar | Nafise Moosavi | Ina Roesiger | Adam Roussel | Fabian Simonjetz | Alexandra Uma | Olga Uryupina | Juntao Yu | Heike Zinsmeister
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference

The ARRAU corpus is an anaphorically annotated corpus of English providing rich linguistic information about anaphora resolution. The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference). Other distinctive features include treating all NPs as markables, including non-referring NPs; and the annotation of a variety of morphosyntactic and semantic mention and entity attributes, including the genericity status of the entities referred to by markables. The corpus however has not been extensively used for anaphora resolution research so far. In this paper, we discuss three datasets extracted from the ARRAU corpus to support the three subtasks of the CRAC 2018 Shared Task–identity anaphora resolution over ARRAU-style markables, bridging references resolution, and discourse deixis; the evaluation scripts assessing system performance on those datasets; and preliminary results on these three tasks that may serve as baseline for subsequent research in these phenomena.

pdf bib
Detecting and Resolving Shell Nouns in German
Adam Roussel
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference

This paper describes the design and evaluation of a system for the automatic detection and resolution of shell nouns in German. Shell nouns are general nouns, such as fact, question, or problem, whose full interpretation relies on a content phrase located elsewhere in a text, which these nouns simultaneously serve to characterize and encapsulate. To accomplish this, the system uses a series of lexico-syntactic patterns in order to extract shell noun candidates and their content in parallel. Each pattern has its own classifier, which makes the final decision as to whether or not a link is to be established and the shell noun resolved. Overall, about 26.2% of the annotated shell noun instances were correctly identified by the system, and of these cases, about 72.5% are assigned the correct content phrase. Though it remains difficult to identify shell noun instances reliably (recall is accordingly low in this regard), this system usually assigns the right content to correctly classified cases. cases.