Zhijie Deng


2025

pdf bib
Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
Zijun Chen | Wenbo Hu | Guande He | Zhijie Deng | ZHeng ZHang | Richang Hong
Proceedings of the 31st International Conference on Computational Linguistics

Multimodal large language models (MLLMs) combine visual and textual data for tasks like image captioning and visual question answering. Proper uncertainty calibration is crucial but challenging for reliable use in areas like healthcare and autonomous driving. This paper investigates several MLLMs, focusing on their calibration across various scenarios, including before and after visual fine-tuning as well as before and after multimodal training of the base LLMs. We observed miscalibration in their performance, and at the same time, no significant differences in calibration across these scenarios. We also highlight differences in uncertainty between text and the impact of the integration of these two types of information in uncertainty. To better understand MLLMs’ miscalibration and their ability to self-assess uncertainty, we developed the IDK (I don’t know) dataset, which is key for evaluating how they handle unknowns. Our findings reveal that MLLMs tend to give answers rather than admit uncertainty, but this self-assessment improves with prompt adjustments. Finally, to calibrate MLLMs and enhance model reliability, we propose techniques such as temperature scaling and iterative prompt optimization. Our results provide insights into improving MLLMs for effective and responsible deployment in multimodal applications.

2024

pdf bib
Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model
Yibo Miao | Hongcheng Gao | Hao Zhang | Zhijie Deng
Findings of the Association for Computational Linguistics: ACL 2024

The detection of machine-generated text, especially from large language models (LLMs), is crucial in preventing serious social problems resulting from their misuse. Some methods train dedicated detectors on specific datasets but fall short in generalizing to unseen test data, while other zero-shot ones often yield suboptimal performance. Although the recent DetectGPT has shown promising detection performance, it suffers from significant inefficiency issues, as detecting a single candidate requires querying the source LLM with hundreds of its perturbations. This paper aims to bridge this gap. Concretely, we propose to incorporate a Bayesian surrogate model, which allows us to select typical samples based on Bayesian uncertainty and interpolate scores from typical samples to other samples, to improve query efficiency. Empirical results demonstrate that our method significantly outperforms existing approaches under a low query budget. Notably, when detecting the text generated by LLaMA family models, our method with just 2 or 3 queries can outperform DetectGPT with 200 queries.

pdf bib
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
Zihao Zeng | Yibo Miao | Hongcheng Gao | Hao Zhang | Zhijie Deng
Findings of the Association for Computational Linguistics: EMNLP 2024

Mixture of experts (MoE) has become the standard for constructing production-level large language models (LLMs) due to its promise to boost model capacity without causing significant overheads. Nevertheless, existing MoE methods usually enforce a constant top-k routing for all tokens, which is arguably restrictive because various tokens (e.g., "<EOS>” vs. “apple”) may require various numbers of experts for feature abstraction. Lifting such a constraint can help make the most of limited resources and unleash the potential of the model for downstream tasks. In this sense, we introduce **AdaMoE** to realize token-adaptive routing for MoE, where different tokens are permitted to select a various number of experts. AdaMoE makes minimal modifications to the vanilla MoE with top-k routing—it simply introduces a fixed number of *null experts*, which do not consume any FLOPs, to the expert set and increases the value of k. AdaMoE does not force each token to occupy a fixed number of null experts but ensures the average usage of the null experts with a load-balancing loss, leading to an adaptive number of null/true experts used by each token. AdaMoE exhibits a strong resemblance to MoEs with expert choice routing while allowing for trivial auto-regressive modeling. AdaMoE is easy to implement and can be effectively applied to pre-trained (MoE-)LLMs. Extensive studies show that AdaMoE can reduce average expert load (FLOPs) while achieving superior performance. For example, on the ARC-C dataset, applying our method to fine-tuning Mixtral-8x7B can reduce FLOPs by 14.5% while increasing accuracy by 1.69%.Code is available at [this link](https://github.com/CengZihao/AdaMoE).