2024
pdf
bib
abs
TRoTR: A Framework for Evaluating the Re-contextualization of Text Reuse
Francesco Periti
|
Pierluigi Cassotti
|
Stefano Montanelli
|
Nina Tahmasebi
|
Dominik Schlechtweg
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Current approaches for detecting text reuse do not focus on recontextualization, i.e., how the new context(s) of a reused text differs from its original context(s). In this paper, we propose a novel framework called TRoTR that relies on the notion of topic relatedness for evaluating the diachronic change of context in which text is reused. TRoTR includes two NLP tasks: TRiC and TRaC. TRiC is designed to evaluate the topic relatedness between a pair of recontextualizations. TRaC is designed to evaluate the overall topic variation within a set of recontextualizations. We also provide a curated TRoTR benchmark of biblical text reuse, human-annotated with topic relatedness. The benchmark exhibits an inter-annotator agreement of .811. We evaluate multiple, established SBERT models on the TRoTR tasks and find that they exhibit greater sensitivity to textual similarity than topic relatedness. Our experiments show that fine-tuning these models can mitigate such a kind of sensitivity.
pdf
bib
abs
Automatically Generated Definitions and their utility for Modeling Word Meaning
Francesco Periti
|
David Alfter
|
Nina Tahmasebi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Modeling lexical semantics is a challenging task, often suffering from interpretability pitfalls. In this paper, we delve into the generation of dictionary-like sense definitions and explore their utility for modeling word meaning. We fine-tuned two Llama models and include an existing T5-based model in our evaluation. Firstly, we evaluate the quality of the generated definitions on existing English benchmarks, setting new state-of-the-art results for the Definition Generation task. Next, we explore the use of definitions generated by our models as intermediate representations subsequently encoded as sentence embeddings. We evaluate this approach on lexical semantics tasks such as the Word-in-Context, Word Sense Induction, and Lexical Semantic Change, setting new state-of-the-art results in all three tasks when compared to unsupervised baselines.
pdf
bib
abs
(Chat)GPT v BERT Dawn of Justice for Semantic Change Detection
Francesco Periti
|
Haim Dubossarsky
|
Nina Tahmasebi
Findings of the Association for Computational Linguistics: EACL 2024
In the universe of Natural Language Processing, Transformer-based language models like BERT and (Chat)GPT have emerged as lexical superheroes with great power to solve open research problems. In this paper, we specifically focus on the temporal problem of semantic change, and evaluate their ability to solve two diachronic extensions of the Word-in-Context (WiC) task: TempoWiC and HistoWiC. In particular, we investigate the potential of a novel, off-the-shelf technology like ChatGPT (and GPT) 3.5 compared to BERT, which represents a family of models that currently stand as the state-of-the-art for modeling semantic change. Our experiments represent the first attempt to assess the use of (Chat)GPT for studying semantic change. Our results indicate that ChatGPT performs significantly worse than the foundational GPT version. Furthermore, our results demonstrate that (Chat)GPT achieves slightly lower performance than BERT in detecting long-term changes but performs significantly worse in detecting short-term changes.
pdf
bib
Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change
Nina Tahmasebi
|
Syrielle Montariol
|
Andrey Kutuzov
|
David Alfter
|
Francesco Periti
|
Pierluigi Cassotti
|
Netta Huebscher
Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change
pdf
bib
Improving Word Usage Graphs with Edge Induction
Bill Noble
|
Francesco Periti
|
Nina Tahmasebi
Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change
pdf
bib
Towards a Complete Solution to Lexical Semantic Change: an Extension to Multiple Time Periods and Diachronic Word Sense Induction
Francesco Periti
|
Nina Tahmasebi
Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change
pdf
bib
abs
A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change
Francesco Periti
|
Nina Tahmasebi
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Contextualized embeddings are the preferred tool for modeling Lexical Semantic Change (LSC). Current evaluations typically focus on a specific task known as Graded Change Detection (GCD). However, performance comparison across work are often misleading due to their reliance on diverse settings. In this paper, we evaluate state-of-the-art models and approaches for GCD under equal conditions. We further break the LSC problem into Word-in-Context (WiC) and Word Sense Induction (WSI) tasks, and compare models across these different levels. Our evaluation is performed across different languages on eight available benchmarks for LSC, and shows that (i) APD outperforms other approaches for GCD; (ii) XL-LEXEME outperforms other contextualized models for WiC, WSI, and GCD, while being comparable to GPT-4; (iii) there is a clear need for improving the modeling of word meanings, as well as focus on *how*, *when*, and *why* these meanings change, rather than solely focusing on the extent of semantic change.
pdf
bib
abs
Computational modeling of semantic change
Pierluigi Cassotti
|
Francesco Periti
|
Stefano De Pascale
|
Haim Dubossarsky
|
Nina Tahmasebi
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts
Languages change constantly over time, influenced by social, technological, cultural and political factors that affect how people express themselves. In particular, words can undergo the process of semantic change, which can be subtle and significantly impact the interpretation of texts. For example, the word terrific used to mean ‘causing terror’ and was as such synonymous to terrifying. Nowadays, speakers use the word in the sense of ‘excessive’ and even ‘amazing’. In Historical Linguistics, tools and methods have been developed to analyse this phenomenon, including systematic categorisations of the types of change, the causes and the mechanisms underlying the different types of change. However, traditional linguistic methods, while informative, are often based on small, carefully curated samples. Thanks to the availability of both large diachronic corpora, the computational means to model word meaning unsupervised, and evaluation benchmarks, we are seeing an increasing interest in the computational modelling of semantic change. This is evidenced by the increasing number of publications in this new domain as well as the organisation of initiatives and events related to this topic, such as four editions of the International Workshop on Computational Approaches to Historical Language Change LChange1, and several evaluation campaigns (Schlechtweg et al., 2020a; Basile et al., 2020b; Kutuzov et al.; Zamora-Reina et al., 2022).
pdf
bib
abs
Analyzing Semantic Change through Lexical Replacements
Francesco Periti
|
Pierluigi Cassotti
|
Haim Dubossarsky
|
Nina Tahmasebi
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Modern language models are capable of contextualizing words based on their surrounding context. However, this capability is often compromised due to semantic change that leads to words being used in new, unexpected contexts not encountered during pre-training. In this paper, we model semantic change by studying the effect of unexpected contexts introduced by lexical replacements. We propose a replacement schema where a target word is substituted with lexical replacements of varying relatedness, thus simulating different kinds of semantic change. Furthermore, we leverage the replacement schema as a basis for a novel interpretable model for semantic change. We are also the first to evaluate the use of LLaMa for semantic change detection.
2023
pdf
bib
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change
Nina Tahmasebi
|
Syrielle Montariol
|
Haim Dubossarsky
|
Andrey Kutuzov
|
Simon Hengchen
|
David Alfter
|
Francesco Periti
|
Pierluigi Cassotti
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change
2022
pdf
bib
abs
What is Done is Done: an Incremental Approach to Semantic Shift Detection
Francesco Periti
|
Alfio Ferrara
|
Stefano Montanelli
|
Martin Ruskov
Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change
Contextual word embedding techniques for semantic shift detection are receiving more and more attention. In this paper, we present What is Done is Done (WiDiD), an incremental approach to semantic shift detection based on incremental clustering techniques and contextual embedding methods to capture the changes over the meanings of a target word along a diachronic corpus. In WiDiD, the word contexts observed in the past are consolidated as a set of clusters that constitute the “memory” of the word meanings observed so far. Such a memory is exploited as a basis for subsequent word observations, so that the meanings observed in the present are stratified over the past ones.