Yuliang Yan


2026

Large language models (LLMs) are considered valuable Intellectual Properties (IP) due to the enormous computational cost of training, making their protection against malicious stealing or unauthorized deployment crucial.Despite efforts in watermarking and fingerprinting, existing methods either affect text generation or rely on white-box access, limiting practicality.To address this, we propose DuFFin, a novel Dual-Level Fingerprinting framework for black-box ownership verification.DuFFin jointly extracts trigger patterns and knowledge-level fingerprints to identify the source of a suspect model.We conduct experiments on diverse open-source models, including four popular base LLMs and their fine-tuned, quantized, and safety-aligned variants released by large companies, start-ups, and individuals.Results show that DuFFin accurately verifies the copyright of protected LLMs on their variants, achieving an IP-ROC greater than 0.99.Our code is available at https://github.com/yuliangyan0807/llm-fingerprint.

2023

Large Language Models (LLMs) have made remarkable advancements in the field of natural language generation. However, the propensity of LLMs to generate inaccurate or non-factual content, termed “hallucinations”, remains a significant challenge. Current hallucination detection methods often necessitate the retrieval of great numbers of relevant evidence, thereby increasing response times. We introduce a unique framework that leverages statistical decision theory and Bayesian sequential analysis to optimize the trade-off between costs and benefits during the hallucination detection process. This approach does not require a predetermined number of observations. Instead, the analysis proceeds in a sequential manner, enabling an expeditious decision towards “belief” or “disbelief” through a stop-or-continue strategy. Extensive experiments reveal that this novel framework surpasses existing methods in both efficiency and precision of hallucination detection. Furthermore, it requires fewer retrieval steps on average, thus decreasing response times.
Recently, much research in psychology has benefited from the advances in machine learning techniques. Some recent studies showed that it is possible to build automated scoring models for children’s mindreading. These models were trained on a set of manually-labeled question-response pairs, which were collected by asking children to answer one or two questions after a short story is told or a video clip is played. However, existing models did not take the features of the stories and video clips into account when scoring, which obviously will reduce the accuracy of the scoring models. Furthermore, considering that different psychological tests may contain the same questions, this approach cannot be extended to other related psychological test datasets. In this study, we proposed a multi-modal learning framework to leverage the features extracted from the stories and videos related to the questions being asked during the children’s mindreading evaluation. Experimental results show that the scores produced by the proposed models agree well with those graded by human experts, highlighting the potential of the proposed network architecture for practical automated children’s mindreading scoring systems.