Yucheng Li


2024

pdf bib
With Ears to See and Eyes to Hear: Sound Symbolism Experiments with Multimodal Large Language Models
Tyler Loakman | Yucheng Li | Chenghua Lin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Recently, Large Language Models (LLMs) and Vision Language Models (VLMs) have demonstrated aptitude as potential substitutes for human participants in experiments testing psycholinguistic phenomena. However, an understudied question is to what extent models that only have access to vision and text modalities are able to implicitly understand sound-based phenomena via abstract reasoning from orthography and imagery alone. To investigate this, we analyse the ability of VLMs and LLMs to demonstrate sound symbolism (i.e., to recognise a non-arbitrary link between sounds and concepts) as well as their ability to “hear” via the interplay of the language and vision modules of open and closed-source multimodal models. We perform multiple experiments, including replicating the classic Kiki-Bouba and Mil-Mal shape and magnitude symbolism tasks and comparing human judgements of linguistic iconicity with that of LLMs. Our results show that VLMs demonstrate varying levels of agreement with human labels, and more task information may be required for VLMs versus their human counterparts for in silico experimentation. We additionally see through higher maximum agreement levels that Magnitude Symbolism is an easier pattern for VLMs to identify than Shape Symbolism, and that an understanding of linguistic iconicity is highly dependent on model size.

pdf bib
An Open-Source Data Contamination Report for Large Language Models
Yucheng Li | Yunhao Guo | Frank Guerin | Chenghua Lin
Findings of the Association for Computational Linguistics: EMNLP 2024

Data contamination in model evaluation has become increasingly prevalent with the growing popularity of large language models. It allows models to “cheat” via memorisation instead of displaying true capabilities. Therefore, contamination analysis has become an crucial part of reliable model evaluation to validate results. However, existing contamination analysis is usually conducted internally by large language model developers and often lacks transparency and completeness. This paper presents an extensive data contamination report for over 15 popular large language models across six popular multiple-choice QA benchmarks. We also introduce an open-source pipeline that enables the community to perform contamination analysis on customised data and models. Our experiments reveal varying contamination levels ranging from 1% to 45% across benchmarks, with the contamination degree increasing rapidly over time. Performance analysis of large language models indicates that data contamination does not necessarily lead to increased model metrics: while significant accuracy boosts of up to 14% and 7% are observed on contaminated C-Eval and Hellaswag benchmarks, only a minimal increase is noted on contaminated MMLU. We also find larger models seem able to gain more advantages than smaller models on contaminated test sets.

pdf bib
On the Rigour of Scientific Writing: Criteria, Analysis, and Insights
Joseph James | Chenghao Xiao | Yucheng Li | Chenghua Lin
Findings of the Association for Computational Linguistics: EMNLP 2024

Rigour is crucial for scientific research as it ensures the reproducibility and validity of results and findings. Despite its importance, little work exists on modelling rigour computationally, and there is a lack of analysis on whether these criteria can effectively signal or measure the rigour of scientific papers in practice. In this paper, we introduce a bottom-up, data-driven framework to automatically identify and define rigour criteria and assess their relevance in scientific writing. Our framework includes rigour keyword extraction, detailed rigour definition generation, and salient criteria identification. Furthermore, our framework is domain-agnostic and can be tailored to the evaluation of scientific rigour for different areas, accommodating the distinct salient criteria across fields. We conducted comprehensive experiments based on datasets collected from different domains (e.g. ICLR, ACL) to demonstrate the effectiveness of our framework in modelling rigour. In addition, we analyse linguist patterns of rigour, revealing that framing certainty is crucial for enhancing the perception of scientific rigour, while suggestion certainty and probability uncertainty diminish it.

pdf bib
Data Contamination Report from the 2024 CONDA Shared Task
Oscar Sainz | Iker García-Ferrero | Alon Jacovi | Jon Ander Campos | Yanai Elazar | Eneko Agirre | Yoav Goldberg | Wei-Lin Chen | Jenny Chim | Leshem Choshen | Luca D’Amico-Wong | Melissa Dell | Run-Ze Fan | Shahriar Golchin | Yucheng Li | Pengfei Liu | Bhavish Pahwa | Ameya Prabhu | Suryansh Sharma | Emily Silcock | Kateryna Solonko | David Stap | Mihai Surdeanu | Yu-Min Tseng | Vishaal Udandarao | Zengzhi Wang | Ruijie Xu | Jinglin Yang
Proceedings of the 1st Workshop on Data Contamination (CONDA)

The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where evaluation data is included in pre-training corpora used to train large scale models, compromising evaluation results. The workshop fostered a shared task to collect evidence on data contamination in current available datasets and models. The goal of the shared task and associated database is to assist the community in understanding the extent of the problem and to assist researchers in avoiding reporting evaluation results on known contaminated resources. The shared task provides a structured, centralized public database for the collection of contamination evidence, open to contributions from the community via GitHub pool requests. This first compilation paper is based on 566 reported entries over 91 contaminated sources from a total of 23 contributors. The details of the individual contamination events are available in the platform. The platform continues to be online, open to contributions from the community.

2023

pdf bib
Metaphor Detection via Explicit Basic Meanings Modelling
Yucheng Li | Shun Wang | Chenghua Lin | Frank Guerin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

One noticeable trend in metaphor detection is the embrace of linguistic theories such as the metaphor identification procedure (MIP) for model architecture design. While MIP clearly defines that the metaphoricity of a lexical unit is determined based on the contrast between its contextual meaning and its basic meaning, existing work does not strictly follow this principle, typically using the aggregated meaning to approximate the basic meaning of target words. In this paper, we propose a novel metaphor detection method, which models the basic meaning of the word based on literal annotation from the training set, and then compares this with the contextual meaning in a target sentence to identify metaphors. Empirical results show that our method outperforms the state-of-the-art method significantly by 1.0% in F1 score. Moreover, our performance even reaches the theoretical upper bound on the VUA18 benchmark for targets with basic annotations, which demonstrates the importance of modelling basic meanings for metaphor detection.

pdf bib
Compressing Context to Enhance Inference Efficiency of Large Language Models
Yucheng Li | Bo Dong | Frank Guerin | Chenghua Lin
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM’s fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50% reduction in context cost, resulting in a 36% reduction in inference memory usage and a 32% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance.

pdf bib
Metaphor Detection with Effective Context Denoising
Shun Wang | Yucheng Li | Chenghua Lin | Loic Barrault | Frank Guerin
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

We propose a novel RoBERTa-based model, RoPPT, which introduces a target-oriented parse tree structure in metaphor detection. Compared to existing models, RoPPT focuses on semantically relevant information and achieves the state-of-the-art on several main metaphor datasets. We also compare our approach against several popular denoising and pruning methods, demonstrating the effectiveness of our approach in context denoising. Our code and dataset can be found at https://github.com/MajiBear000/RoPPT.

pdf bib
FrameBERT: Conceptual Metaphor Detection with Frame Embedding Learning
Yucheng Li | Shun Wang | Chenghua Lin | Frank Guerin | Loic Barrault
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

In this paper, we propose FrameBERT, a BERT-based model that can explicitly learn and incorporate FrameNet Embeddings for concept-level metaphor detection. FrameBERT not only achieves better or comparable performance to the state-of-the-art, but also is more explainable and interpretable compared to existing models, attributing to its ability of accounting for external knowledge of FrameNet.

2022

pdf bib
Nominal Metaphor Generation with Multitask Learning
Yucheng Li | Chenghua Lin | Frank Guerin
Proceedings of the 15th International Conference on Natural Language Generation

pdf bib
The Secret of Metaphor on Expressing Stronger Emotion
Yucheng Li | Frank Guerin | Chenghua Lin
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

Metaphors are proven to have stronger emotional impact than literal expressions. Although this conclusion is shown to be promising in benefiting various NLP applications, the reasons behind this phenomenon are not well studied. This paper conducts the first study in exploring how metaphors convey stronger emotion than their literal counterparts. We find that metaphors are generally more specific than literal expressions. The more specific property of metaphor can be one of the reasons for metaphors’ superiority in emotion expression. When we compare metaphors with literal expressions with the same specificity level, the gap of emotion expressing ability between both reduces significantly. In addition, we observe specificity is crucial in literal language as well, as literal language can express stronger emotion by making it more specific.

pdf bib
CM-Gen: A Neural Framework for Chinese Metaphor Generation with Explicit Context Modelling
Yucheng Li | Chenghua Lin | Frank Guerin
Proceedings of the 29th International Conference on Computational Linguistics

Nominal metaphors are frequently used in human language and have been shown to be effective in persuading, expressing emotion, and stimulating interest. This paper tackles the problem of Chinese Nominal Metaphor (NM) generation. We introduce a novel multitask framework, which jointly optimizes three tasks: NM identification, NM component identification, and NM generation. The metaphor identification module is able to perform a self-training procedure, which discovers novel metaphors from a large-scale unlabeled corpus for NM generation. The NM component identification module emphasizes components during training and conditions the generation on these NM components for more coherent results. To train the NM identification and component identification modules, we construct an annotated corpus consisting of 6.3k sentences that contain diverse metaphorical patterns. Automatic metrics show that our method can produce diverse metaphors with good readability, where 92% of them are novel metaphorical comparisons. Human evaluation shows our model significantly outperforms baselines on consistency and creativity.

2020

pdf bib
Hierarchical Region Learning for Nested Named Entity Recognition
Xinwei Long | Shuzi Niu | Yucheng Li
Findings of the Association for Computational Linguistics: EMNLP 2020

Named Entity Recognition (NER) is deeply explored and widely used in various tasks. Usually, some entity mentions are nested in other entities, which leads to the nested NER problem. Leading region based models face both the efficiency and effectiveness challenge due to the high subsequence enumeration complexity. To tackle these challenges, we propose a hierarchical region learning framework to automatically generate a tree hierarchy of candidate regions with nearly linear complexity and incorporate structure information into the region representation for better classification. Experiments on benchmark datasets ACE-2005, GENIA and JNLPBA demonstrate competitive or better results than state-of-the-art baselines.