Ziqi Zhang

2025

Large Vision-Language Models (LVLMs) suffer from serious hallucination problems, where the model-generated responses are inconsistent with the visual inputs. Existing hallucination mitigation methods are mainly based on preference alignment and require external human annotations or auxiliary models for preference data collection, which increase costs and limit sustainable improvement. To tackle these challenges, we propose Autonomous Preference Alignment via Self-Injection (APASI), a novel and generalizable method that mitigates hallucinations without external dependencies. APASI leverages the target LVLM to self-inject hallucinations into a generated response, creating a pair of responses with varying preference levels. During the self-injection process, the dis-preferred response is generated based on three key observations of hallucinations, ensuring it simulates real hallucination patterns. This fidelity offers an accurate learning signal for hallucination mitigation. Moreover, APASI incorporates an iterative alignment training strategy combined with curriculum learning to periodically update the preference data with increasing challenge, enabling stable and continuous enhancement of the LVLM. Extensive experiments across six benchmarks show that APASI not only effectively mitigates hallucinations for three baseline models but also achieves comparable or even superior performance to alignment-based methods with external dependency, thereby demonstrating its effectiveness and generalization capability.

pdf bib abs

From BERT to LLMs: Comparing and Understanding Chinese Classifier Prediction in Language Models
Ziqi Zhang | Jianfei Ma | Emmanuele Chersoni | You Jieshun | Zhaoxin Feng
Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Classifiers are an important and defining feature of the Chinese language, and their correct prediction is key to numerous educational applications. Yet, whether the most popular Large Language Models (LLMs) possess proper knowledge the Chinese classifiers is an issue that has largely remain unexplored in the Natural Language Processing (NLP) literature.To address such a question, we employ various masking strategies to evaluate the LLMs’ intrinsic ability, the contribution of different sentence elements, and the working of the attention mechanisms during prediction. Besides, we explore fine-tuning for LLMs to enhance the classifier performance.Our findings reveal that LLMs perform worse than BERT, even with fine-tuning. The prediction, as expected, greatly benefits from the information about the following noun, which also explains the advantage of models with a bidirectional attention mechanism such as BERT.

pdf bib abs

Membership and Memorization in LLM Knowledge Distillation
Ziqi Zhang | Ali Shahin Shamsabadi | Hanxiao Lu | Yifeng Cai | Hamed Haddadi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Recent advances in Knowledge Distillation (KD) aim to mitigate the high computational demands of Large Language Models (LLMs) by transferring knowledge from a large ”teacher” to a smaller ”student” model. However, students may inherit the teacher’s privacy when the teacher is trained on private data. In this work, we systematically characterize and investigate membership privacy risks inherent in six LLM KD techniques.Using instruction-tuning settings that span seven NLP tasks, together with three teacher model families (GPT-2, LLAMA-2, and OPT), and various size student models, we demonstrate that all existing LLM KD approaches carry membership and memorization privacy risks from the teacher to its students. However, the extent of privacy risks varies across different KD techniques. We systematically analyse how key LLM KD components (KD objective functions, student training data and NLP tasks) impact such privacy risks. We also demonstrate a significant disagreement between memorization and membership privacy risks of LLM KD techniques. Finally, we characterize per-block privacy risk and demonstrate that the privacy risk varies across different blocks by a large margin.

pdf bib abs

PhonoThink: Improving Large Language Models’ Reasoning on Chinese Phonological Ambiguities
Jianfei Ma | Zhaoxin Feng | Emmanuele Chersoni | Huacheng Song | Ziqi Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Effectively resolving phonological ambiguities is crucial for robust natural language processing, as these ambiguities are pervasive in tasks ranging from speech-to-text, spelling correction, to offensive language detection. However, current Large Language Models (LLMs) frequently struggle to resolve such ambiguities.To address this challenge, we present a framework to enhances LLMs’ phonological capability through a multiple-stage training approach. Our method begins with supervised fine-tuning on well-constructed datasets, including three subtask datasets designed to enhance the model’s foundational phonological knowledge, along with a synthetic dataset of step-by-step reasoning chains. Following this, we apply reinforcement learning to incentivize and stabilize its reasoning.Results show that our framework enables the base model to achieve relatively comparable performance to a much larger model. Our ablation studies reveal that subtask datasets and the synthetic dataset can simultaneously impact as complementary modular enhancers to strengthen LLMs’ integrated application.

2024

pdf bib abs

Nash CoT: Multi-Path Inference with Preference Equilibrium
Ziqi Zhang | Cunxiang Wang | Xiao Xiong | Yue Zhang | Donglin Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Chain of thought (CoT) is a reasoning framework that can enhance the performance of large language models (LLMs) on complex inference tasks. In particular, among various studies related to CoT, multi-path inference stands out as a simple yet effective improvement. However, there is no optimal setting for the number of inference paths. Therefore, we have to increase the number of inference paths to obtain better results, which in turn increases the inference cost. To address this limitation, we can utilize question-related role templates to guide LLMs into relevant roles, thereby increasing the possibility of correct inferences for each path and further reducing dependence on the number of inference paths while improving reasoning accuracy. However, placing LLMs into specific roles may reduce their reasoning diversity and performance on a few tasks where role dependence is low. To alleviate the excessive immersion of the LLM into a specific role, we propose Nash CoT by constructing a competitive system on each path that balances the generation from role-specific LLMs’ and the general LLMs’ generation, thereby ensuring both effective role adoption and diversity in LLM generation further maintaining the performance of multi-path inference while reducing the requirement of the number of inference paths. We evaluate Nash CoT across various inference tasks, including Arabic Reasoning, Commonsense Question Answering, and Symbolic Inference, achieving results that are comparable to or better than those of multi-path CoT with the equal number of inference paths.

2016

pdf bib abs

JATE 2.0: Java Automatic Term Extraction with Apache Solr
Ziqi Zhang | Jie Gao | Fabio Ciravegna
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Automatic Term Extraction (ATE) or Recognition (ATR) is a fundamental processing step preceding many complex knowledge engineering tasks. However, few methods have been implemented as public tools and in particular, available as open-source freeware. Further, little effort is made to develop an adaptable and scalable framework that enables customization, development, and comparison of algorithms under a uniform environment. This paper introduces JATE 2.0, a complete remake of the free Java Automatic Term Extraction Toolkit (Zhang et al., 2008) delivering new features including: (1) highly modular, adaptable and scalable ATE thanks to integration with Apache Solr, the open source free-text indexing and search platform; (2) an extended collection of state-of-the-art algorithms. We carry out experiments on two well-known benchmarking datasets and compare the algorithms along the dimensions of effectiveness (precision) and efficiency (speed and memory consumption). To the best of our knowledge, this is by far the only free ATE library offering a flexible architecture and the most comprehensive collection of algorithms.

2013

pdf bib

Mining Equivalent Relations from Linked Data
Ziqi Zhang | Anna Lisa Gentile | Isabelle Augenstein | Eva Blomqvist | Fabio Ciravegna
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib abs

Automatically Extracting Procedural Knowledge from Instructional Texts using Natural Language Processing
Ziqi Zhang | Philip Webster | Victoria Uren | Andrea Varga | Fabio Ciravegna
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Procedural knowledge is the knowledge required to perform certain tasks, and forms an important part of expertise. A major source of procedural knowledge is natural language instructions. While these readable instructions have been useful learning resources for human, they are not interpretable by machines. Automatically acquiring procedural knowledge in machine interpretable formats from instructions has become an increasingly popular research topic due to their potential applications in process automation. However, it has been insufficiently addressed. This paper presents an approach and an implemented system to assist users to automatically acquire procedural knowledge in structured forms from instructions. We introduce a generic semantic representation of procedures for analysing instructions, using which natural language techniques are applied to automatically extract structured procedures from instructions. The method is evaluated in three domains to justify the generality of the proposed semantic representation as well as the effectiveness of the implemented automatic system.

2011

pdf bib

Harnessing different knowledge sources to measure semantic relatedness under a uniform model
Ziqi Zhang | Anna Lisa Gentile | Fabio Ciravegna
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib abs

A Random Graph Walk based Approach to Computing Semantic Relatedness Using Knowledge from Wikipedia
Ziqi Zhang | Anna Lisa Gentile | Lei Xia | José Iria | Sam Chapman
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Determining semantic relatedness between words or concepts is a fundamental process to many Natural Language Processing applications. Approaches for this task typically make use of knowledge resources such as WordNet and Wikipedia. However, these approaches only make use of limited number of features extracted from these resources, without investigating the usefulness of combining various different features and their importance in the task of semantic relatedness. In this paper, we propose a random walk model based approach to measuring semantic relatedness between words or concepts, which seamlessly integrates various features extracted from Wikipedia to compute semantic relatedness. We empirically study the usefulness of these features in the task, and prove that by combining multiple features that are weighed according to their importance, our system obtains competitive results, and outperforms other systems on some datasets.

pdf bib abs

Improving Domain-specific Entity Recognition with Automatic Term Recognition and Feature Extraction
Ziqi Zhang | José Iria | Fabio Ciravegna
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Domain specific entity recognition often relies on domain-specific knowledge to improve system performance. However, such knowledge often suffers from limited domain portability and is expensive to build and maintain. Therefore, obtaining it in a generic and unsupervised manner would be a desirable feature for domain-specific entity recognition systems. In this paper, we introduce an approach that exploits domain-specificity of words as a form of domain-knowledge for entity-recognition tasks. Compared to prior work in the field, our approach is generic and completely unsupervised. We empirically show an improvement in entity extraction accuracy when features derived by our unsupervised method are used, with respect to baseline methods that do not employ domain knowledge. We also compared the results against those of existing systems that use manually crafted domain knowledge, and found them to be competitive.

2009

pdf bib

A Novel Approach to Automatic Gazetteer Generation using Wikipedia
Ziqi Zhang | José Iria
Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web)

pdf bib

Too Many Mammals: Improving the Diversity of Automatically Recognized Terms
Ziqi Zhang | Lei Xia | Mark A. Greenwood | José Iria
Proceedings of the International Conference RANLP-2009

2008

pdf bib abs

A Comparative Evaluation of Term Recognition Algorithms
Ziqi Zhang | Jose Iria | Christopher Brewster | Fabio Ciravegna
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Automatic Term recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a combined approach us¬ing a voting mechanism. We evaluated the six approaches using two different corpora and show how the voting algo¬rithm performs best on one corpus (a collection of texts from Wikipedia) and less well using the Genia corpus (a standard life science corpus). This indicates that choice and design of corpus has a major impact on the evaluation of term recog¬nition algorithms. Our experiments also showed that single-word terms can be equally important and occupy a fairly large proportion in certain domains. As a result, algorithms that ignore single-word terms may cause problems to tasks built on top of ATR. Effective ATR systems also need to take into account both the unstructured text and the structured aspects and this means information extraction techniques need to be integrated into the term recognition process.