Kiamehr Rezaee


pdf bib
TweetTER: A Benchmark for Target Entity Retrieval on Twitter without Knowledge Bases
Kiamehr Rezaee | Jose Camacho-Collados | Mohammad Taher Pilehvar
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Entity linking is a well-established task in NLP consisting of associating entity mentions with entries in a knowledge base. Current models have demonstrated competitive performance in standard text settings. However, when it comes to noisy domains such as social media, certain challenges still persist. Typically, to evaluate entity linking on existing benchmarks, a comprehensive knowledge base is necessary and models are expected to possess an understanding of all the entities contained within the knowledge base. However, in practical scenarios where the objective is to retrieve sentences specifically related to a particular entity, strict adherence to a complete understanding of all entities in the knowledge base may not be necessary. To address this gap, we introduce TweetTER (Tweet Target Entity Retrieval), a novel benchmark that aims to bridge the challenges in entity linking. The distinguishing feature of this benchmark is its approach of re-framing entity linking as a binary entity retrieval task. This enables the evaluation of language models’ performance without relying on a conventional knowledge base, providing a more practical and versatile evaluation framework for assessing the effectiveness of language models in entity retrieval tasks.


pdf bib
SuperTweetEval: A Challenging, Unified and Heterogeneous Benchmark for Social Media NLP Research
Dimosthenis Antypas | Asahi Ushio | Francesco Barbieri | Leonardo Neves | Kiamehr Rezaee | Luis Espinosa-Anke | Jiaxin Pei | Jose Camacho-Collados
Findings of the Association for Computational Linguistics: EMNLP 2023

Despite its relevance, the maturity of NLP for social media pales in comparison with general-purpose models, metrics and benchmarks. This fragmented landscape makes it hard for the community to know, for instance, given a task, which is the best performing model and how it compares with others. To alleviate this issue, we introduce a unified benchmark for NLP evaluation in social media, SuperTweetEval, which includes a heterogeneous set of tasks and datasets combined, adapted and constructed from scratch. We benchmarked the performance of a wide range of models on SuperTweetEval and our results suggest that, despite the recent advances in language modelling, social media remains challenging.


pdf bib
Exploiting Language Model Prompts Using Similarity Measures: A Case Study on the Word-in-Context Task
Mohsen Tabasi | Kiamehr Rezaee | Mohammad Taher Pilehvar
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

As a recent development in few-shot learning, prompt-based techniques have demonstrated promising potential in a variety of natural language processing tasks. However, despite proving competitive on most tasks in the GLUE and SuperGLUE benchmarks, existing prompt-based techniques fail on the semantic distinction task of the Word-in-Context (WiC) dataset. Specifically, none of the existing few-shot approaches (including the in-context learning of GPT-3) can attain a performance that is meaningfully different from the random baseline. Trying to fill this gap, we propose a new prompting technique, based on similarity metrics, which boosts few-shot performance to the level of fully supervised methods. Our simple adaptation shows that the failure of existing prompt-based techniques in semantic distinction is due to their improper configuration, rather than lack of relevant knowledge in the representations. We also show that this approach can be effectively extended to other downstream tasks for which a single prompt is sufficient.

pdf bib
Probing Relational Knowledge in Language Models via Word Analogies
Kiamehr Rezaee | Jose Camacho-Collados
Findings of the Association for Computational Linguistics: EMNLP 2022

Understanding relational knowledge plays an integral part in natural language comprehension. When it comes to pre-trained language models (PLM), prior work has been focusing on probing relational knowledge this by filling the blanks in pre-defined prompts such as “The capital of France is —". However, these probes may be affected by the co-occurrence of target relation words and entities (e.g. “capital”, “France” and “Paris”) in the pre-training corpus. In this work, we extend these probing methodologies leveraging analogical proportions as a proxy to probe relational knowledge in transformer-based PLMs without directly presenting the desired relation. In particular, we analysed the ability of PLMs to understand (1) the directionality of a given relation (e.g. Paris-France is not the same as France-Paris); (2) the ability to distinguish types on a given relation (both France and Japan are countries); and (3) the relation itself (Paris is the capital of France, but not Rome). Our results show how PLMs are extremely accurate at (1) and (2), but have clear room for improvement for (3). To better understand the reasons behind this behaviour and mistakes made by PLMs, we provide an extended quantitative analysis based on relevant factors such as frequency.

pdf bib
TweetNLP: Cutting-Edge Natural Language Processing for Social Media
Jose Camacho-collados | Kiamehr Rezaee | Talayeh Riahi | Asahi Ushio | Daniel Loureiro | Dimosthenis Antypas | Joanne Boisson | Luis Espinosa Anke | Fangyu Liu | Eugenio Martínez Cámara
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media. TweetNLP supports a diverse set of NLP tasks, including generic focus areas such as sentiment analysis and named entity recognition, as well as social media-specific tasks such as emoji prediction and offensive language identification. Task-specific systems are powered by reasonably-sized Transformer-based language models specialized on social media text (in particular, Twitter) which can be run without the need for dedicated hardware or cloud services. The main contributions of TweetNLP are: (1) an integrated Python library for a modern toolkit supporting social media analysis using our various task-specific models adapted to the social domain; (2) an interactive online demo for codeless experimentation using our models; and (3) a tutorial covering a wide variety of typical social media applications.


pdf bib
On the Cross-lingual Transferability of Contextualized Sense Embeddings
Kiamehr Rezaee | Daniel Loureiro | Jose Camacho-Collados | Mohammad Taher Pilehvar
Proceedings of the 1st Workshop on Multilingual Representation Learning

In this paper we analyze the extent to which contextualized sense embeddings, i.e., sense embeddings that are computed based on contextualized word embeddings, are transferable across languages. To this end, we compiled a unified cross-lingual benchmark for Word Sense Disambiguation. We then propose two simple strategies to transfer sense-specific knowledge across languages and test them on the benchmark. Experimental results show that this contextualized knowledge can be effectively transferred to similar languages through pre-trained multilingual language models, to the extent that they can out-perform monolingual representations learnednfrom existing language-specific data.

pdf bib
WiC-TSV: An Evaluation Benchmark for Target Sense Verification of Words in Context
Anna Breit | Artem Revenko | Kiamehr Rezaee | Mohammad Taher Pilehvar | Jose Camacho-Collados
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We present WiC-TSV, a new multi-domain evaluation benchmark for Word Sense Disambiguation. More specifically, we introduce a framework for Target Sense Verification of Words in Context which grounds its uniqueness in the formulation as binary classification task thus being independent of external sense inventories, and the coverage of various domains. This makes the dataset highly flexible for the evaluation of a diverse set of models and systems in and across domains. WiC-TSV provides three different evaluation settings, depending on the input signals provided to the model. We set baseline performance on the dataset using state-of-the-art language models. Experimental results show that even though these models can perform decently on the task, there remains a gap between machine and human performance, especially in out-of-domain settings. WiC-TSV data is available at

pdf bib
Analysis and Evaluation of Language Models for Word Sense Disambiguation
Daniel Loureiro | Kiamehr Rezaee | Mohammad Taher Pilehvar | Jose Camacho-Collados
Computational Linguistics, Volume 47, Issue 2 - June 2021

Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations in encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model-based WSD strategies, namely, fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data. In fact, the simple feature extraction strategy of averaging contextualized embeddings proves robust even using only three training sentences per word sense, with minimal improvements obtained by increasing the size of this training data.