Zixin Chen
2024
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models
Zixin Chen
|
Hongzhan Lin
|
Ziyang Luo
|
Mingfei Cheng
|
Jing Ma
|
Guang Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image. This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs. Experimental results demonstrate that our model far outperforms state-of-the-art MSTI methods, and markedly exhibits explainability in deciphering sarcasm as well.
FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation
KaShun Shum
|
Minrui Xu
|
Jianshu Zhang
|
Zixin Chen
|
Shizhe Diao
|
Hanze Dong
|
Jipeng Zhang
|
Muhammad Omer Raza
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) have become increasingly prevalent in our daily lives, leading to an expectation for LLMs to be trustworthy —- both accurate and well-calibrated (the prediction confidence should align with its ground truth correctness likelihood). Nowadays, fine-tuning has become the most popular method for adapting a model to practical usage by significantly increasing accuracy on downstream tasks. Despite the great accuracy it achieves, we found fine-tuning is still far away from satisfactory trustworthiness due to “tuning-induced mis-calibration”. In this paper, we delve deeply into why and how mis-calibration exists in fine-tuned models, and how distillation can alleviate the issue. Then we further propose a brand new method named Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher’s knowledge to obtain a reliable language model in a cost-efficient way. Specifically, we identify the “concentrated knowledge” phenomenon during distillation, which can significantly reduce the computational burden. Then we apply a “trustworthy maximization” process to optimize the utilization of this small portion of concentrated knowledge before transferring it to the student. Experimental results demonstrate the effectiveness of our method, where better accuracy (+2.3%) and less mis-calibration (-10%) are achieved on average across both in-domain and out-of-domain scenarios, indicating better trustworthiness.
Search
Fix data
Co-authors
- Guang Chen 1
- Mingfei Cheng 1
- Shizhe Diao 1
- Hanze Dong 1
- Hongzhan Lin 1
- show all...