Huan Liu - ACL Anthology

Huan Liu

Also published as: 欢刘

Papers on this page may belong to the following people: Huan Liu, Huan Liu

2026

ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning
Xuan Xiong | Huan Liu | Li Gu | Zhixiang Chi | Yue Qiu | Yuanhao YU | Yang Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Chain-of-thought (CoT) reasoning improves large language model performance on complex tasks, but often produces excessively long and inefficient reasoning traces. Existing methods shorten CoTs using length penalties or global entropy reduction, implicitly assuming that low uncertainty is desirable throughout reasoning. We show instead that reasoning efficiency is governed by the trajectory of uncertainty. CoTs with dominant downward entropy trends are substantially shorter. Motivated by this insight, we propose **E**ntropy **T**rend **R**eward (**ETR**), a trajectory-aware objective that encourages progressive uncertainty reduction while allowing limited local exploration. We integrate ETR into Group Relative Policy Optimization (GRPO) and evaluate it across multiple reasoning models and challenging benchmarks. ETR consistently achieves a superior accuracy–efficiency trade-off, improving DeepSeek-R1-Distill-7B by +9.9% accuracy while reducing CoT length by 67% across four benchmarks.

One Pair Suffices: Unlocking Universal Zero-Shot Translation via Cross-Architecture Alignment
Hao Zong | Cong Hu Yuan | Chao Bei | Wentao Chen | Huan Liu | Kaiyu Huang | Degen Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Current paradigms for empowering Large Language Models (LLMs) with multilingual capabilities rely heavily on massive instruction tuning. We challenge this view, proposing that the barrier is topological alignment, not data quantity. We introduce Hybrid Cross-Alignment (HCA), fusing a frozen NLLB encoder with a Qwen decoder via a closed-loop dual-adapter architecture. HCA utilizes a Source-Side Adapter to precondition encoder features and a Query-Residual Adapter to preserve generative stability, bridged by an adaptive gated cross-modal interface. Our core discovery is Universal Alignment Generalization.” We demonstrate that training HCA on a single language pair (German-English) unlocks state-of-the-art zero-shot transfer to dozens of unseen languages. Crucially, our Oracle” experiments reveal that this single-pair training recovers over 96.7% of the performance achievable by training on all available pairs. This proves that a universal, language-agnostic projection protocol exists. With a total inference footprint of 5.25B parameters, our model significantly outperforms larger baselines, surpassing TowerPlus-9B (+9.0 COMET on low-resource languages) and Aya-101 (13B). Furthermore, performance scales linearly with encoder size; upgrading from 600M to 1.3B yields immediate gains (+3.4 points on Gujarati) with minimal retraining cost.

2025

DLUT and GTCOM’s Large Language Model Based Translation System for WMT25
Hao Zong | Chao Bei | Wentao Chen | Conghu Yuan | Huan Liu | Degen Huang
Proceedings of the Tenth Conference on Machine Translation

This paper presents the submission from Dalian University of Technology (DLUT) and Global Tone Communication Technology Co., Ltd. (GTCOM) to the WMT25 General Machine Translation Task. Amidst the paradigm shift from specialized encoder-decoder models to general-purpose Large Language Models (LLMs), this work conducts a systematic comparison of both approaches across five language pairs. For traditional Neural Machine Translation (NMT), we build strong baselines using deep Transformer architectures enhanced with data augmentation. For the LLM paradigm, we explore zero-shot performance and two distinct supervised fine-tuning (SFT) strategies: direct translation and translation refinement. Our key findings reveal a significant discrepancy between lexical and semantic evaluation metrics: while strong NMT systems remain competitive in BLEU scores, fine-tuned LLMs demonstrate marked superiority in semantic fidelity as measured by COMET. Furthermore, we find that fine-tuning LLMs for direct translation is more effective than for refinement, suggesting that teaching the core task directly is preferable to correcting baseline outputs.

Sibyl: Empowering Empathetic Dialogue Generation in Large Language Models via Sensible and Visionary Commonsense Inference
Lanrui Wang | Jiangnan Li | Chenxu Yang | Zheng Lin | Hongyin Tang | Huan Liu | Yanan Cao | Jingang Wang | Weiping Wang
Proceedings of the 31st International Conference on Computational Linguistics

Recently, there has been a heightened interest in building chatbots based on Large Language Models (LLMs) to emulate human-like qualities in multi-turn conversations. Despite having access to commonsense knowledge to better understand the psychological aspects and causality of dialogue context, even these powerful LLMs struggle to achieve the goals of empathy and emotional support. Current commonsense knowledge derived from dialogue contexts is inherently limited and often fails to adequately anticipate the future course of a dialogue. This lack of foresight can mislead LLMs and hinder their ability to provide effective support. In response to this challenge, we present an innovative framework named Sensible and Visionary Commonsense Knowledge (Sibyl). Designed to concentrate on the immediately succeeding dialogue, this paradigm equips LLMs with the capability to uncover the implicit requirements of the conversation, aiming to elicit more empathetic responses. Experimental results demonstrate that incorporating our paradigm for acquiring commonsense knowledge into LLMs comprehensively enhances the quality of their responses.

Predicting and Evaluating Item Responses Using Machine Learning, Text Embeddings, and LLMs
Evelyn Johnson | Hsin-Ro Wei | Tong Wu | Huan Liu
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress

This work-in-progress study compares the accuracy of machine learning and large language models to predict student responses to field-test items on a social-emotional learning assessment. We evaluate how well each method replicates actual responses and examine the item parameters generated by synthetic data to those derived from actual student data.

2024

Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales
Ayushi Nirmal | Amrita Bhattacharjee | Paras Sheth | Huan Liu
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)

Although social media platforms are a prominent arena for users to engage in interpersonal discussions and express opinions, the facade and anonymity offered by social media may allow users to spew hate speech and offensive content. Given the massive scale of such platforms, there arises a need to automatically identify and flag instances of hate speech. Although several hate speech detection methods exist, most of these black-box methods are not interpretable or explainable by design. To address the lack of interpretability, in this paper, we propose to use state-of-the-art Large Language Models (LLMs) to extract features in the form of rationales from the input text, to train a base hate speech classifier, thereby enabling faithful interpretability by design. Our framework effectively combines the textual understanding capabilities of LLMs and the discriminative power of state-of-the-art hate speech classifiers to make these classifiers faithfully interpretable. Our comprehensive evaluation on a variety of social media hate speech datasets demonstrate: (1) the goodness of the LLM-extracted rationales, and (2) the surprising retention of detector performance even after training to ensure interpretability. All code and data will be made available at https://github.com/AmritaBh/shield.

DLUT and GTCOM’s Neural Machine Translation Systems for WMT24
Hao Zong | Chao Bei | Huan Liu | Conghu Yuan | Wentao Chen | Degen Huang
Proceedings of the Ninth Conference on Machine Translation

This paper presents the submission from Global Tone Communication Co., Ltd. and Dalian University of Technology for the WMT24 shared general Machine Translation (MT) task at the Conference on Empirical Methods in Natural Language Processing (EMNLP). Our participation encompasses two language pairs: English to Japanese and Japanese to Chinese. The systems are developed without particular constraints or requirements, facilitating extensive research in machine translation. We emphasize back-translation, utilize multilingual translation models, and apply fine-tuning strategies to improve performance. Additionally, we integrate both human-generated and machine-generated data to fine-tune our models, leading to enhanced translation accuracy. The automatic evaluation results indicate that our system ranks first in terms of BLEU score for the Japanese to Chinese translation.

Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey
Garima Agrawal | Tharindu Kumarage | Zeyad Alghamdi | Huan Liu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

The contemporary LLMs are prone to producing hallucinations, stemming mainly from the knowledge gaps within the models. To address this critical limitation, researchers employ diverse strategies to augment the LLMs by incorporating external knowledge, aiming to reduce hallucinations and enhance reasoning accuracy. Among these strategies, leveraging knowledge graphs as a source of external information has demonstrated promising results. In this survey, we comprehensively review these knowledge-graph-based augmentation techniques in LLMs, focusing on their efficacy in mitigating hallucinations. We systematically categorize these methods into three overarching groups, offering methodological comparisons and performance evaluations. Lastly, this survey explores the current trends and challenges associated with these techniques and outlines potential avenues for future research in this emerging field.

基于蒙古文文本语义辅助的噪声鲁棒蒙古语语音情感识别方法研究(Research on Noise-Robust Mongolian Speech Emotion Recognition Methods Based on Mongolian Text Semantics)
Huan Liu (刘欢) | Kailin Liang (梁凯麟) | Haolin Zuo (左昊麟) | Rui Liu (刘瑞)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“噪声环境下语音情感识别(Speech Emotion Recognition,SER)旨在从带有背景噪声的语音信号中挖掘情感特征并自动预测说话人的情感状态。尽管这项技术在英语、汉语等语言方面取得了迅速的进展,但对于像蒙古语这样的小语种,在噪声环境下的语音情感识别研究仍处于起步阶段,缺乏相关数据集和方法的研究。为了推动蒙古语语音情感识别的发展,本研究首先构建了一个单说话人语音情感识别数据集。之后为了实现噪声环境下准确的蒙古语语音情感识别,我们提出了一种基于文本-语音双模态的带噪蒙古语语音情感识别基线模型 MonSER。文本信息为噪声语音信号提供额外的语义信息。具体来说,我们的模型首先对带噪语音信号进行频谱特征提取,之后使用多语种预训练模型 XLMBert 对语音信号对应的蒙古文文本信息进行编码。随后将上述提取的双模态信息进行融合,并输入分类器进行情感类别的预测。我们利用该数据集进行模型训练并测试模型的有效性。实验结果表明,我们的双模态模型在多种噪声环境下的蒙古语语音情感识别准确率明显优于只以语音为输入的单模态语音情感识别系统。同时,为了模拟实际场景中文本可能缺失的情况,我们提出了两种文本 mask 策略,该文本实验也进一步验证了文本语音双模态的有效性。”

2023

ConDA: Contrastive Domain Adaptation for AI-generated Text Detection
Amrita Bhattacharjee | Tharindu Kumarage | Raha Moraffah | Huan Liu
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

J-Guard: Journalism Guided Adversarially Robust Detection of AI-generated News
Tharindu Kumarage | Amrita Bhattacharjee | Djordje Padejski | Kristy Roschke | Dan Gillmor | Scott Ruston | Huan Liu | Joshua Garland
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts
Tharindu Kumarage | Paras Sheth | Raha Moraffah | Joshua Garland | Huan Liu
Findings of the Association for Computational Linguistics: EMNLP 2023

In recent years, there has been a rapid proliferation of AI-generated text, primarily driven by the release of powerful pre-trained language models (PLMs). To address the issue of misuse associated with AI-generated text, various high-performing detectors have been developed, including the OpenAI detector and the Stanford DetectGPT. In our study, we ask how reliable these detectors are. We answer the question by designing a novel approach that can prompt any PLM to generate text that evades these high-performing detectors. The proposed approach suggests a universal evasive prompt, a novel type of soft prompt, which guides PLMs in producing “human-like” text that can mislead the detectors. The novel universal evasive prompt is achieved in two steps: First, we create an evasive soft prompt tailored to a specific PLM through prompt tuning; and then, we leverage the transferability of soft prompts to transfer the learned evasive soft prompt from one PLM to another. Employing multiple PLMs in various writing tasks, we conduct extensive experiments to evaluate the efficacy of the evasive soft prompts in their evasion of state-of-the-art detectors.

2022

DUTNLP Machine Translation System for WMT22 General MT Task
Ting Wang | Huan Liu | Junpeng Liu | Degen Huang
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes DUTNLP Lab’s submission to the WMT22 General MT Task on four translation directions: English to/from Chinese and English to/from Japanese under the constrained condition. Our primary system are built on several Transformer variants which employ wider FFN layer or deeper encoder layer. The bilingual data are filtered by detailed data pre-processing strategies and four data augmentation methods are combined to enlarge the training data with the provided monolingual data. Several common methods are also employed to further improve the model performance, such as fine-tuning, model ensemble and post-editing. As a result, our constrained systems achieve 29.01, 63.87, 41.84, and 24.82 BLEU scores on Chinese-to-English, English-to-Chinese, English-to-Japanese, and Japanese-to-English, respectively.

Adaptive Token-level Cross-lingual Feature Mixing for Multilingual Neural Machine Translation
Junpeng Liu | Kaiyu Huang | Jiuyi Li | Huan Liu | Jinsong Su | Degen Huang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Multilingual neural machine translation aims to translate multiple language pairs in a single model and has shown great success thanks to the knowledge transfer across languages with the shared parameters. Despite promising, this share-all paradigm suffers from insufficient ability to capture language-specific features. Currently, the common practice is to insert or search language-specific networks to balance the shared and specific features. However, those two types of features are not sufficient enough to model the complex commonality and divergence across languages, such as the locally shared features among similar languages, which leads to sub-optimal transfer, especially in massively multilingual translation. In this paper, we propose a novel token-level feature mixing method that enables the model to capture different features and dynamically determine the feature sharing across languages. Based on the observation that the tokens in the multilingual model are usually shared by different languages, we we insert a feature mixing layer into each Transformer sublayer and model each token representation as a mix of different features, with a proportion indicating its feature preference. In this way, we can perform fine-grained feature sharing and achieve better multilingual transfer. Experimental results on multilingual datasets show that our method outperforms various strong baselines and can be extended to zero-shot translation. Further analyses reveal that our method can capture different linguistic features and bridge the representation gap across languages.

Debiasing Word Embeddings with Nonlinear Geometry
Lu Cheng | Nayoung Kim | Huan Liu
Proceedings of the 29th International Conference on Computational Linguistics

Debiasing word embeddings has been largely limited to individual and independent social categories. However, real-world corpora typically present multiple social categories that possibly correlate or intersect with each other. For instance, “hair weaves” is stereotypically associated with African American females, but neither African American nor females alone. Therefore, this work studies biases associated with multiple social categories: joint biases induced by the union of different categories and intersectional biases that do not overlap with the biases of the constituent categories. We first empirically observe that individual biases intersect non-trivially (i.e., over a one-dimensional subspace). Drawing from the intersectional theory in social science and the linguistic theory, we then construct an intersectional subspace to debias for multiple social categories using the nonlinear geometry of individual biases. Empirical evaluations corroborate the efficacy of our approach.

2021

DUTNLP Machine Translation System for WMT21 Triangular Translation Task
Huan Liu | Junpeng Liu | Kaiyu Huang | Degen Huang
Proceedings of the Sixth Conference on Machine Translation

This paper describes DUT-NLP Lab’s submission to the WMT-21 triangular machine translation shared task. The participants are not allowed to use other data and the translation direction of this task is Russian-to-Chinese. In this task, we use the Transformer as our baseline model, and integrate several techniques to enhance the performance of the baseline, including data filtering, data selection, fine-tuning, and post-editing. Further, to make use of the English resources, such as Russian/English and Chinese/English parallel data, the relationship triangle is constructed by multilingual neural machine translation systems. As a result, our submission achieves a BLEU score of 21.9 in Russian-to-Chinese.

Learning to Selectively Learn for Weakly-supervised Paraphrase Generation
Kaize Ding | Dingcheng Li | Alexander Hanbo Li | Xing Fan | Chenlei Guo | Yang Liu | Huan Liu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Paraphrase generation is a longstanding NLP task that has diverse applications on downstream NLP tasks. However, the effectiveness of existing efforts predominantly relies on large amounts of golden labeled data. Though unsupervised endeavors have been proposed to alleviate this issue, they may fail to generate meaningful paraphrases due to the lack of supervision signals. In this work, we go beyond the existing paradigms and propose a novel approach to generate high-quality paraphrases with data of weak supervision. Specifically, we tackle the weakly-supervised paraphrase generation problem by: (1) obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion; and (2) developing a meta-learning framework to progressively select valuable samples for fine-tuning a pre-trained language model BART on the sentential paraphrasing task. We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.

Mitigating Bias in Session-based Cyberbullying Detection: A Non-Compromising Approach
Lu Cheng | Ahmadreza Mosallanezhad | Yasin Silva | Deborah Hall | Huan Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The element of repetition in cyberbullying behavior has directed recent computational studies toward detecting cyberbullying based on a social media session. In contrast to a single text, a session may consist of an initial post and an associated sequence of comments. Yet, emerging efforts to enhance the performance of session-based cyberbullying detection have largely overlooked unintended social biases in existing cyberbullying datasets. For example, a session containing certain demographic-identity terms (e.g., “gay” or “black”) is more likely to be classified as an instance of cyberbullying. In this paper, we first show evidence of such bias in models trained on sessions collected from different social media platforms (e.g., Instagram). We then propose a context-aware and model-agnostic debiasing strategy that leverages a reinforcement learning technique, without requiring any extra resources or annotations apart from a pre-defined set of sensitive triggers commonly used for identifying cyberbullying instances. Empirical evaluations show that the proposed strategy can simultaneously alleviate the impacts of the unintended biases and improve the detection performance.

2020

Be More with Less: Hypergraph Attention Networks for Inductive Text Classification
Kaize Ding | Jianling Wang | Jundong Li | Dingcheng Li | Huan Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Text classification is a critical research topic with broad applications in natural language processing. Recently, graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task. Despite the success, their performance could be largely jeopardized in practice since they are: (1) unable to capture high-order interaction between words; (2) inefficient to handle large datasets and new documents. To address those issues, in this paper, we propose a principled model – hypergraph attention networks (HyperGAT), which can obtain more expressive power with less computational consumption for text representation learning. Extensive experiments on various benchmark datasets demonstrate the efficacy of the proposed approach on the text classification task.

2019

Deep Reinforcement Learning-based Text Anonymization against Private-Attribute Inference
Ahmadreza Mosallanezhad | Ghazaleh Beigi | Huan Liu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

User-generated textual data is rich in content and has been used in many user behavioral modeling tasks. However, it could also leak user private-attribute information that they may not want to disclose such as age and location. User’s privacy concerns mandate data publishers to protect privacy. One effective way is to anonymize the textual data. In this paper, we study the problem of textual data anonymization and propose a novel Reinforcement Learning-based Text Anonymizor, RLTA, which addresses the problem of private-attribute leakage while preserving the utility of textual data. Our approach first extracts a latent representation of the original text w.r.t. a given task, then leverages deep reinforcement learning to automatically learn an optimal strategy for manipulating text representations w.r.t. the received privacy and utility feedback. Experiments show the effectiveness of this approach in terms of preserving both privacy and utility.

2016

A Novel Measure for Coherence in Statistical Topic Models
Fred Morstatter | Huan Liu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2014

Finding Eyewitness Tweets During Crises
Fred Morstatter | Nichola Lubold | Heather Pon-Barry | Jürgen Pfeffer | Huan Liu
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

Co-authors

Kaiyu Huang (黄锴宇) 3

Joshua Garland 2

Raha Moraffah 2

Fred Morstatter 2

Ahmadreza Mosallanezhad 2

Garima Agrawal 1

Zeyad Alghamdi 1

Ghazaleh Beigi 1

Evelyn Johnson 1

Alexander Hanbo Li 1

Yang Liu (刘扬) 1

Nichola Lubold 1

Ayushi Nirmal 1

Djordje Padejski 1

Jürgen Pfeffer 1

Heather Pon-Barry 1

Kristy Roschke 1

Jianling Wang 1

Venues