Tong Zhou

2025

CiteLab: Developing and Diagnosing LLM Citation Generation Workflows via the Human-LLM Interaction
Jiajun Shen | Tong Zhou | Yubo Chen | Kang Liu | Jun Zhao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

The emerging paradigm of enabling Large Language Models (LLMs) to generate citations in Question-Answering (QA) tasks is lacking in a unified framework to standardize and fairly compare different citation generation methods, leading to difficulties in reproduction and innovation. Therefore, we introduce Citeflow, an open-source and modular framework fostering reproduction and the implementation of new designs. Citeflow is highly extensible, allowing users to utilize four main modules and 14 components to construct a pipeline, evaluate an existing method, and understand the attributing LLM-generated contents. The framework is also paired with a visual interface, Citefix, facilitating case study and modification of existing citation generation methods. Users can use this interface to conduct LLM-powered case studies according to different scenarios. Citeflow and Citefix are highly integrated into the toolkit CiteLab, and we use an authentic process of multiple rounds of improvement through the Human-LLM interaction interface to demonstrate the efficiency of our toolkit on implementing and modifying citation generation pipelines. Citelab is released at https://github.com/SjJ1017/Citelab

pdf bib abs

While hallucinations of large language models could be alleviated through retrieval-augmented generation and citation generation, how the model utilizes internal knowledge is still opaque, and the trustworthiness of its generated answers remains questionable. In this work, we introduce Context-Prior Augmented Citation Generation task, requiring models to generate citations considering both external and internal knowledge while providing trustworthy references, with 5 evaluation metrics focusing on 3 aspects: answer helpfulness, citation faithfulness, and trustworthiness. We introduce RAEL, the paradigm for our task, and also design INTRALIGN, an integrated method containing customary data generation and an alignment algorithm. Our experimental results show that our method achieves a better cross-scenario performance with regard to other baselines. Our extended experiments further reveal that retrieval quality, question types, and model knowledge have considerable influence on the trustworthiness in citation generation.

pdf bib abs

MotivGraph-SoIQ: Integrating Motivational Knowledge Graphs and Socratic Dialogue for Enhanced LLM Ideation
Xinping Lei | Tong Zhou | Yubo Chen | Kang Liu | Jun Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Language Models (LLMs) hold significant promise for accelerating academic ideation but face critical challenges in grounding ideas and mitigating confirmation bias during refinement. To address these limitations, we propose MotivGraph-SoIQ, a novel framework that enhances LLM ideation by integrating a Motivational Knowledge Graph (MotivGraph), which provides essential grounding from research literature, with a Q-Driven Socratic Ideator. The Ideator, a dual-agent system utilizing Socratic questioning, facilitates a rigorous refinement process that mitigates confirmation bias and significantly improves idea quality across dimensions of novelty, experimental feasibility, and motivation. Our experimental results demonstrate MotivGraph-SoIQ’s effectiveness. Comparative studies show significant quantitative improvements over SOTA methods across LLM-based scoring, ELO ranking, and human evaluation. Ablation studies further validate the crucial contributions of both the MotivGraph for enhancing idea novelty and practicality, and the Socratic dialogue with the teacher agent for substantial quality improvement. This work underscores the potential of combining structured knowledge with interactive, critique-based refinement for robust LLM ideation.

pdf bib

DTELS: Towards Dynamic Granularity of Timeline Summarization
Chenlong Zhang | Tong Zhou | Pengfei Cao | Zhuoran Jin | Yubo Chen | Kang Liu | Jun Zhao
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

2024

pdf bib abs

Open Event Causality Extraction by the Assistance of LLM in Task Annotation, Dataset, and Method
Kun Luo | Tong Zhou | Yubo Chen | Jun Zhao | Kang Liu
Proceedings of the Workshop: Bridging Neurons and Symbols for Natural Language Processing and Knowledge Graphs Reasoning (NeusymBridge) @ LREC-COLING-2024

Event Causality Extraction (ECE) aims to extract explicit causal relations between event pairs from the text. However, the event boundary deviation and the causal event pair mismatching are two crucial challenges that remain unaddressed. To address the above issues, we propose a paradigm to utilize LLM to optimize the task definition, evolve the datasets, and strengthen our proposed customized Contextual Highlighting Event Causality Extraction framework (CHECE). Specifically in CHECE, we propose an Event Highlighter and an Event Concretization Module, guiding the model to represent the event by a higher-level cluster and consider its causal counterpart in event boundary prediction to deal with event boundary deviation. And we propose a Contextual Event Causality Matching mechanism, meanwhile, applying LLM to diversify the content templates to force the model to learn causality from context to targeting on causal event pair mismatching. Experimental results on two ECE datasets demonstrate the effectiveness of our method.

pdf bib abs

CogMG: Collaborative Augmentation Between Large Language Model and Knowledge Graph
Tong Zhou | Yubo Chen | Kang Liu | Jun Zhao
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Large language models have become integral to question-answering applications despite their propensity for generating hallucinations and factually inaccurate content. Querying knowledge graphs to reduce hallucinations in LLM meets the challenge of incomplete knowledge coverage in knowledge graphs. On the other hand, updating knowledge graphs by information extraction and knowledge graph completion faces the knowledge update misalignment issue. In this work, we introduce a collaborative augmentation framework, CogMG, leveraging knowledge graphs to address the limitations of LLMs in QA scenarios, explicitly targeting the problems of incomplete knowledge coverage and knowledge update misalignment. The LLMs identify and decompose required knowledge triples that are not present in the KG, enriching them and aligning updates with real-world demands. We demonstrate the efficacy of this approach through a supervised fine-tuned LLM within an agent framework, showing significant improvements in reducing hallucinations and enhancing factual accuracy in QA responses. Our code and video are publicly available.

pdf bib abs

Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models
Kun Luo | Zheng Liu | Shitao Xiao | Tong Zhou | Yubo Chen | Jun Zhao | Kang Liu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Retrieval augmentation is a promising approach to handle long-context language modeling. However, the existing retrieval methods usually work with the chunked context, which is prone to inferior quality of semantic representation and incomplete retrieval of useful information. In this work, we propose a new method for the retrieval augmentation of long-context language modeling, called Landmark Embedding. Our method is characterized by threefold technical contributions. Firstly, we introduce a chunking-free architecture, which keeps the long context coherent such that high-quality embeddings can be generated for the fine-grained units within the context. Secondly, we present a position-aware objective function, which prioritizes the ultimate boundary for a consecutive span of information. By learning to discriminate such a special position, the useful information can be comprehensively retrieved for the query. Thirdly, we design a novel multi-stage learning algorithm, which makes the best use of readily available data and synthetic data for cost-effective training of the landmark embedding. In our experimental study, landmark embedding is able to substantially improve the performance for both LLaMA-2 and ChatGPT in a variety of long-context tasks; meanwhile, it also outperforms the existing retrieval methods with a notable advantage. Our model and source code will be made publicly available.

2023

pdf bib abs

Event extraction aims to recognize pre-defined event triggers and arguments from texts, which suffer from the lack of high-quality annotations. In most NLP applications, involving a large scale of synthetic training data is a practical and effective approach to alleviate the problem of data scarcity. However, when applying to the task of event extraction, recent data augmentation methods often neglect the problem of grammatical incorrectness, structure misalignment, and semantic drifting, leading to unsatisfactory performances. In order to solve these problems, we propose a denoised structure-to-text augmentation framework for event extraction (DAEE), which generates additional training data through the knowledge-based structure-to-text generation model and selects the effective subset from the generated data iteratively with a deep reinforcement learning agent. Experimental results on several datasets demonstrate that the proposed method generates more diverse text representations for event extraction and achieves comparable results with the state-of-the-art.

2021

pdf bib abs

Automatic ICD Coding via Interactive Shared Representation Networks with Self-distillation Mechanism
Tong Zhou | Pengfei Cao | Yubo Chen | Kang Liu | Jun Zhao | Kun Niu | Weifeng Chong | Shengping Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The ICD coding task aims at assigning codes of the International Classification of Diseases in clinical notes. Since manual coding is very laborious and prone to errors, many methods have been proposed for the automatic ICD coding task. However, existing works either ignore the long-tail of code frequency or the noisy clinical notes. To address the above issues, we propose an Interactive Shared Representation Network with Self-Distillation Mechanism. Specifically, an interactive shared representation network targets building connections among codes while modeling the co-occurrence, consequently alleviating the long-tail problem. Moreover, to cope with the noisy text issue, we encourage the model to focus on the clinical note’s noteworthy part and extract valuable information through a self-distillation learning mechanism. Experimental results on two MIMIC datasets demonstrate the effectiveness of our method.

pdf bib abs

This is the system description of the CASIA_Unisound team for Task 1, Task 7b, and Task 8 of the sixth Social Media Mining for Health Applications (SMM4H) shared task in 2021. Targeting on deal with two shared challenges, the colloquial text and the imbalance annotation, among those tasks, we apply a customized pre-trained language model and propose various training strategies. Experimental results show the effectiveness of our system. Moreover, we got an F1-score of 0.87 in task 8, which is the highest among all participates.