Tongliang Liu

2026

Medical coding converts free-text clinical notes into standardized diagnostic and procedural codes, which are essential for billing, hospital operations, and medical research. Unlike ordinary text classification, it requires multi-step reasoning: extracting diagnostic concepts, applying guideline constraints, mapping to hierarchical codebooks, and ensuring cross-document consistency. Recent advances leverage agentic LLMs, but most rely on rigid, manually crafted workflows that fail to capture the nuance and variability of real-world documentation, leaving open the question of how to systematically learn effective workflows. We present MedDCR, a closed-loop framework that treats workflow design as a learning problem. A Designer proposes workflows, a Coder executes them, and a Reflector evaluates predictions and provides constructive feedback, while a memory archive preserves prior designs for reuse and iterative refinement. On benchmark datasets, MedDCR outperforms state-of-the-art baselines and produces interpretable, adaptable workflows that better reflect real coding practice, improving both the reliability and trustworthiness of automated systems.

pdf bib abs

Select Before Use: On the Importance of Reference Model Selection in Preference Alignment
Muyang Li | Runze Wu | Xiangyu Zhao | Bo Han | Daoyi Dong | Tongliang Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The post-training stage of Large Language Models (LLMs) typically involves Supervised Fine-Tuning (SFT) followed by preference alignment to ensure LLM to generate safe, helpful, and instruction-aligned content. The SFT model critically serves as both the initialization and reference model for subsequent preference alignment. However, an essential yet often neglected question is the optimal selection of the SFT checkpoint for this role. We show that checkpoint selection substantially affects final performance, and that the common practice of choosing the minimum validation-loss checkpoint often fails, due to a fundamental conflict between SFT’s focus on imitation and alignment’s goal of response discriminability. To this end, we propose RewardRank, a simple, effective, training-free metrics for estimating initial implicit alignment between reference model and preference objective. Empirical evidence suggests that, using our selected model as reference can gain up to 67.6% relative increase on length-controlled win rate on the popular Zephyr recipe comparing to baselines.

2024

pdf bib abs

Contemporary practices in instruction tuning often hinge on enlarging data scaling without a clear strategy for ensuring data quality, inadvertently introducing noise that may compromise model performance. To address this challenge, we introduce Nuggets, a novel and efficient methodology that leverages one-shot learning to discern and select high-quality instruction data from extensive datasets. Nuggets assesses the potential of individual instruction examples to act as effective one-shot learning instances, thereby identifying those that can significantly improve performance across diverse tasks. Nuggets utilizes a scoring system based on the impact of candidate examples on the perplexity of a diverse anchor set, facilitating the selection of the most advantageous data for instruction tuning. Through rigorous evaluations on two benchmarks, namely MT-Bench and Alpaca-Eval, our study illustrates that instruction tuning with the top 1% of examples curated by Nuggets substantially outperforms conventional methods employing the entire dataset.