2024
pdf
bib
abs
How Are Metaphors Processed by Language Models? The Case of Analogies
Joanne Boisson
|
Asahi Ushio
|
Hsuvas Borkakoty
|
Kiamehr Rezaee
|
Dimosthenis Antypas
|
Zara Siddique
|
Nina White
|
Jose Camacho-Collados
Proceedings of the 28th Conference on Computational Natural Language Learning
The ability to compare by analogy, metaphorically or not, lies at the core of how humans understand the world and communicate. In this paper, we study the likelihood of metaphoric outputs, and the capability of a wide range of pretrained transformer-based language models to identify metaphors from other types of analogies, including anomalous ones. In particular, we are interested in discovering whether language models recognise metaphorical analogies equally well as other types of analogies, and whether the model size has an impact on this ability. The results show that there are relevant differences using perplexity as a proxy, with the larger models reducing the gap when it comes to analogical processing, and for distinguishing metaphors from incorrect analogies. This behaviour does not result in increased difficulties for larger generative models in identifying metaphors in comparison to other types of analogies from anomalous sentences in a zero-shot generation setting, when perplexity values of metaphoric and non-metaphoric analogies are similar.
2023
pdf
bib
abs
Construction Artifacts in Metaphor Identification Datasets
Joanne Boisson
|
Luis Espinosa-Anke
|
Jose Camacho-Collados
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Metaphor identification aims at understanding whether a given expression is used figuratively in context. However, in this paper we show how existing metaphor identification datasets can be gamed by fully ignoring the potential metaphorical expression or the context in which it occurs. We test this hypothesis in a variety of datasets and settings, and show that metaphor identification systems based on language models without complete information can be competitive with those using the full context. This is due to the construction procedures to build such datasets, which introduce unwanted biases for positive and negative classes. Finally, we test the same hypothesis on datasets that are carefully sampled from natural corpora and where this bias is not present, making these datasets more challenging and reliable.
2022
pdf
bib
abs
TweetNLP: Cutting-Edge Natural Language Processing for Social Media
Jose Camacho-collados
|
Kiamehr Rezaee
|
Talayeh Riahi
|
Asahi Ushio
|
Daniel Loureiro
|
Dimosthenis Antypas
|
Joanne Boisson
|
Luis Espinosa Anke
|
Fangyu Liu
|
Eugenio Martínez Cámara
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media. TweetNLP supports a diverse set of NLP tasks, including generic focus areas such as sentiment analysis and named entity recognition, as well as social media-specific tasks such as emoji prediction and offensive language identification. Task-specific systems are powered by reasonably-sized Transformer-based language models specialized on social media text (in particular, Twitter) which can be run without the need for dedicated hardware or cloud services. The main contributions of TweetNLP are: (1) an integrated Python library for a modern toolkit supporting social media analysis using our various task-specific models adapted to the social domain; (2) an interactive online demo for codeless experimentation using our models; and (3) a tutorial covering a wide variety of typical social media applications.
pdf
bib
abs
CardiffNLP-Metaphor at SemEval-2022 Task 2: Targeted Fine-tuning of Transformer-based Language Models for Idiomaticity Detection
Joanne Boisson
|
Jose Camacho-Collados
|
Luis Espinosa-Anke
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This paper describes the experiments ran for SemEval-2022 Task 2, subtask A, zero-shot and one-shot settings for idiomaticity detection. Our main approach is based on fine-tuning transformer-based language models as a baseline to perform binary classification. Our system, CardiffNLP-Metaphor, ranked 8th and 7th (respectively on zero- and one-shot settings on this task. Our main contribution lies in the extensive evaluation of transformer-based language models and various configurations, showing, among others, the potential of large multilingual models over base monolingual models. Moreover, we analyse the impact of various input parameters, which offer interesting insights on how language models work in practice.
2015
pdf
bib
WriteAhead: Mining Grammar Patterns in Corpora for Assisted Writing
Tzu-Hsi Yen
|
Jian-Cheng Wu
|
Jim Chang
|
Joanne Boisson
|
Jason Chang
Proceedings of ACL-IJCNLP 2015 System Demonstrations
pdf
bib
Learning Sentential Patterns of Various Rhetoric Moves for Assisted Academic Writing
Jim Chang
|
Hsiang-Ling Hsu
|
Joanne Boisson
|
Hao-Chun Peng
|
Yu-Hsuan Wu
|
Jason S. Chang
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters
2013
pdf
bib
Linggle: a Web-scale Linguistic Search Engine for Words in Context
Joanne Boisson
|
Ting-Hui Kao
|
Jian-Cheng Wu
|
Tzu-Hsi Yen
|
Jason S. Chang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations
pdf
bib
CoNLL-2013 Shared Task: Grammatical Error Correction NTHU System Description
Ting-Hui Kao
|
Yu-Wei Chang
|
Hsun-Wen Chiu
|
Tzu-Hsi Yen
|
Joanne Boisson
|
Jian-Cheng Wu
|
Jason S. Chang
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task