Jie Hao - ACL Anthology

Jie Hao

2026

MAC: A Multi-Agent Framework for Interactive User Clarification in Multi-turn Conversations
Emre Can Acikgoz | Jinoh Oh | Joo Hyuk Jeon | Jie Hao | Heng Ji | Dilek Hakkani-Tur | Gokhan Tur | Xiang Li | Chengyuan Ma | Xing Fan
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology

Conversational agents often encounter ambiguous user requests, requiring an effective clarification to successfully complete tasks. While recent advancements in real-world applications favor multi-agent architectures to manage complex conversational scenarios efficiently, ambiguity resolution remains a critical and underexplored challenge—particularly due to the difficulty of determining which agent should initiate a clarification and how agents should coordinate their actions when faced with uncertain or incomplete user input. The fundamental questions of when to interrupt a user and how to formulate the optimal clarification query within the most optimal multi-agent settings remain open. In this paper, we propose MAC (Multi-Agent Clarification), an interactive multi-agent framework specifically optimized to resolve user ambiguities by strategically managing clarification dialogues. We first introduce a novel taxonomy categorizing user ambiguities to systematically guide clarification strategies. Then, we present MAC that autonomously coordinates multiple agents to interact synergistically with users. Empirical evaluations on MultiWOZ 2.4 demonstrate that enabling clarification at both levels increases task success rate 7.8% (54.5 → 62.3) and reduces the average number of dialogue turns (6.53 → 4.86) by eliciting all required user information up front and minimizing repetition. Our findings highlight the importance of active user interaction and role-aware clarification for more reliable human–agent communication.

SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning
Emre Can Acikgoz | Jinoh Oh | Jie Hao | Joo Hyuk Jeon | Heng Ji | Dilek Hakkani-Tur | Gokhan Tur | Xiang Li | Chengyuan Ma | Xing Fan
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology

Effective human-agent collaboration is increasingly prevalent in real-world applications. Current trends in such collaborations are predominantly unidirectional, with users providing instructions or posing questions to agents, where agents respond directly without seeking necessary clarifications or confirmations. However, the evolving capabilities of these agents require more proactive engagement, where agents should dynamically participate in conversations to clarify user intents, resolve ambiguities, and adapt to changing circumstances. Existing prior work under-utilize the conversational capabilities of language models (LMs), thereby optimizing agents as better followers rather than effective speakers. In this work, we introduce SpeakRL, a reinforcement learning (RL) method that enhances agents’ conversational capabilities by rewarding proactive interactions with users, such as asking right clarification questions when necessary. To support this, we curate SpeakER, a synthetic dataset that includes diverse scenarios from task-oriented dialogues, where tasks are resolved through interactive clarification questions. We present a systematic analysis of reward design for conversational proactivity and propose a principled reward formulation for teaching agents to balance asking with acting. Empirical evaluations demonstrate that our approach achieves a 20.14% absolute improvement in task completion over base models without increasing conversation turns even surpassing even much larger proprietary models, demonstrating the promise of clarification-centric user-agent interactions.

2025

StoC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering
Zhenyu Bi | Daniel Hajialigol | Zhongkai Sun | Jie Hao | Xuan Wang
Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing

Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question. Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts (e.g., chain-of-thought reasoning) for the MHQA task. However, the complexities in the question types (bridge v.s. comparison questions) and the reasoning types (sequential v.s. parallel reasonings) require more novel and fine-grained prompting methods to enhance the performance of MHQA under the zero-shot setting.In this paper, we propose StoC-ToT, a stochastic tree-of-thought reasoning prompting method with constrained decoding for MHQA and conduct a detailed comparison with other reasoning prompts on different question types and reasoning types. Specifically, we construct a tree-like reasoning structure by prompting the model to break down the original question into smaller sub-questions to form different reasoning paths. In addition, we prompt the model to provide a probability estimation for each reasoning path at each reasoning step. At answer time, we conduct constrained decoding on the model to generate more grounded answers and reduce hallucination. Experiments comparing StoC-ToT with on two MHQA datasets and five large language models showed that outperforms other reasoning prompts by a significant margin.

2023

Unified Contextual Query Rewriting
Yingxue Zhou | Jie Hao | Mukund Rungta | Yang Liu | Eunah Cho | Xing Fan | Yanbin Lu | Vishal Vasudevan | Kellen Gillespie | Zeynab Raeesy
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

Query rewriting (QR) is an important technique for user friction (i.e. recovering ASR error or system error) reduction and contextual carryover (i.e. ellipsis and co-reference) in conversational AI systems. Recently, generation-based QR models have achieved promising results on these two tasks separately. Although these two tasks have many similarities such as they both use the previous dialogue along with the current request as model input, there is no unified model to solve them jointly. To this end, we propose a unified contextual query rewriting model that unifies QR for both reducing friction and contextual carryover purpose. Moreover, we involve multiple auxiliary tasks such as trigger prediction and NLU interpretation tasks to boost the performance of the rewrite. We leverage the text-to-text unified framework which uses independent tasks with weighted loss to account for task importance. Then we propose new unified multitask learning strategies including a sequential model which outputs one sentence for multi-tasks, and a hybrid model where some tasks are independent and some tasks are sequentially generated. Our experimental results demonstrate the effectiveness of the proposed unified learning methods.

Improving Contextual Query Rewrite for Conversational AI Agents through User-preference Feedback Learning
Zhongkai Sun | Yingxue Zhou | Jie Hao | Xing Fan | Yanbin Lu | Chengyuan Ma | Wei Shen | Chenlei Guo
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

Contextual query rewriting (CQR) is a crucial component in Conversational AI agents, leveraging the contextual information from previous user-agent conversations to improve the comprehension of current user intent. However, traditional CQR methods often concentrate on supervised fine-tuning only, neglecting the opportunities to learn from user feedback to align with user preferences. Inspired by recent advances in learning from human feedback (LHF), this paper proposes a novel Preference Aligned Contextual Query Rewriting (PA-CQR) framework to enhance the CQR model’s capability in generating user preference-aligned rewrites. This paper also investigates the efficacy of various state-of-the-art feedback learning algorithms on the CQR task, and proposes a novel Dynamic Direct Preference Optimization (Dynamic DPO) algorithm to better adapt the DPO algorithm to large-scale CQR training. Experiments on large-scale real-world CQR data set demonstrate the superiority of the proposed PA-CQR framework and the Dynamic DPO.

2022

PENTATRON: PErsonalized coNText-Aware Transformer for Retrieval-based cOnversational uNderstanding
Niranjan Uma Naresh | Ziyan Jiang | Ankit | Sungjin Lee | Jie Hao | Xing Fan | Chenlei Guo
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

Conversational understanding is an integral part of modern intelligent devices. In a large fraction of the global traffic from customers using smart digital assistants, frictions in dialogues may be attributed to incorrect understanding of the entities in a customer’s query due to factors including ambiguous mentions, mispronunciation, background noise and faulty on-device signal processing. Such errors are compounded by two common deficiencies from intelligent devices namely, (1) the device not being tailored to individual customers, and (2) the device responses being unaware of the context in the conversation session. Viewing this problem via the lens of retrieval-based search engines, we build and evaluate a scalable entity correction system, PENTATRON. The system leverages a parametric transformer-based language model to learn patterns from in-session customer-device interactions coupled with a non-parametric personalized entity index to compute the correct query, which aids downstream components in reasoning about the best response. In addition to establishing baselines and demonstrating the value of personalized and context-aware systems, we use multitasking to learn the domain of the correct entity. We also investigate the utility of language model prompts. Through extensive experiments, we show a significant upward movement of the key metric (Exact Match) by up to 500.97% (relative to the baseline).

PAIGE: Personalized Adaptive Interactions Graph Encoder for Query Rewriting in Dialogue Systems
Daniel Biś | Saurabh Gupta | Jie Hao | Xing Fan | Chenlei Guo
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

Unexpected responses or repeated clarification questions from conversational agents detract from the users’ experience with technology meant to streamline their daily tasks. To reduce these frictions, Query Rewriting (QR) techniques replace transcripts of faulty queries with alternatives that lead to responses thatsatisfy the users’ needs. Despite their successes, existing QR approaches are limited in their ability to fix queries that require considering users’ personal preferences. We improve QR by proposing Personalized Adaptive Interactions Graph Encoder (PAIGE).PAIGE is the first QR architecture that jointly models user’s affinities and query semantics end-to-end. The core idea is to represent previous user-agent interactions and world knowledge in a structured form — a heterogeneous graph — and apply message passing to propagate latent representations of users’ affinities to refine utterance embeddings.Using these embeddings, PAIGE can potentially provide different rewrites given the same query for users with different preferences. Our model, trained without any human-annotated data, improves the rewrite retrieval precision of state-of-the-art baselines by 12.5–17.5% while having nearly ten times fewer parameters.

CGF: Constrained Generation Framework for Query Rewriting in Conversational AI
Jie Hao | Yang Liu | Xing Fan | Saurabh Gupta | Saleh Soltan | Rakesh Chada | Pradeep Natarajan | Chenlei Guo | Gokhan Tur
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

In conversational AI agents, Query Rewriting (QR) plays a crucial role in reducing user frictions and satisfying their daily demands. User frictions are caused by various reasons, such as errors in the conversational AI system, users’ accent or their abridged language. In this work, we present a novel Constrained Generation Framework (CGF) for query rewriting at both global and personalized levels. It is based on the encoder-decoder framework, where the encoder takes the query and its previous dialogue turns as the input to form a context-enhanced representation, and the decoder uses constrained decoding to generate the rewrites based on the pre-defined global or personalized constrained decoding space. Extensive offline and online A/B experiments show that the proposed CGF significantly boosts the query rewriting performance.

Overcoming Catastrophic Forgetting During Domain Adaptation of Seq2seq Language Generation
Dingcheng Li | Zheng Chen | Eunah Cho | Jie Hao | Xiaohu Liu | Fan Xing | Chenlei Guo | Yang Liu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Seq2seq language generation models that are trained offline with multiple domains in a sequential fashion often suffer from catastrophic forgetting. Lifelong learning has been proposed to handle this problem. However, existing work such as experience replay or elastic weighted consolidation requires incremental memory space. In this work, we propose an innovative framework, RMR_DSEthat leverages a recall optimization mechanism to selectively memorize important parameters of previous tasks via regularization, and uses a domain drift estimation algorithm to compensate the drift between different do-mains in the embedding space. These designs enable the model to be trained on the current task while keep-ing the memory of previous tasks, and avoid much additional data storage. Furthermore, RMR_DSE can be combined with existing lifelong learning approaches. Our experiments on two seq2seq language generation tasks, paraphrase and dialog response generation, show thatRMR_DSE outperforms SOTA models by a considerable margin and reduces forgetting greatly.

2021

Contextual Rephrase Detection for Reducing Friction in Dialogue Systems
Zhuoyi Wang | Saurabh Gupta | Jie Hao | Xing Fan | Dingcheng Li | Alexander Hanbo Li | Chenlei Guo
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

For voice assistants like Alexa, Google Assistant, and Siri, correctly interpreting users’ intentions is of utmost importance. However, users sometimes experience friction with these assistants, caused by errors from different system components or user errors such as slips of the tongue. Users tend to rephrase their queries until they get a satisfactory response. Rephrase detection is used to identify the rephrases and has long been treated as a task with pairwise input, which does not fully utilize the contextual information (e.g. users’ implicit feedback). To this end, we propose a contextual rephrase detection model ContReph to automatically identify rephrases from multi-turn dialogues. We showcase how to leverage the dialogue context and user-agent interaction signals, including the user’s implicit feedback and the time gap between different turns, which can help significantly outperform the pairwise rephrase detection models.

RAST: Domain-Robust Dialogue Rewriting as Sequence Tagging
Jie Hao | Linfeng Song | Liwei Wang | Kun Xu | Zhaopeng Tu | Dong Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context. Until now, the existing models for this task suffer from the robustness issue, i.e., performances drop dramatically when testing on a different dataset. We address this robustness issue by proposing a novel sequence-tagging-based model so that the search space is significantly reduced, yet the core of this task is still well covered. As a common issue of most tagging models for text generation, the model’s outputs may lack fluency. To alleviate this issue, we inject the loss signal from BLEU or GPT-2 under a REINFORCE framework. Experiments show huge improvements of our model over the current state-of-the-art systems when transferring to another dataset.

Personalized Search-based Query Rewrite System for Conversational AI
Eunah Cho | Ziyan Jiang | Jie Hao | Zheng Chen | Saurabh Gupta | Xing Fan | Chenlei Guo
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Query rewrite (QR) is an emerging component in conversational AI systems, reducing user defect. User defect is caused by various reasons, such as errors in the spoken dialogue system, users’ slips of the tongue or their abridged language. Many of the user defects stem from personalized factors, such as user’s speech pattern, dialect, or preferences. In this work, we propose a personalized search-based QR framework, which focuses on automatic reduction of user defect. We build a personalized index for each user, which encompasses diverse affinity layers to reflect personal preferences for each user in the conversational AI. Our personalized QR system contains retrieval and ranking layers. Supported by user feedback based learning, training our models does not require hand-annotated data. Experiments on personalized test set showed that our personalized QR system is able to correct systematic and user errors by utilizing phonetic and semantic inputs.

The Mininglamp Machine Translation System for WMT21
Shiyu Zhao | Xiaopu Li | Minghui Wu | Jie Hao
Proceedings of the Sixth Conference on Machine Translation

This paper describes Mininglamp neural machine translation systems of the WMT2021 news translation tasks. We have participated in eight directions translation tasks for news text including Chinese to/from English, Hausa to/from English, German to/from English and French to/from German. Our fundamental system was based on Transformer architecture, with wider or smaller construction for different news translation tasks. We mainly utilized the method of back-translation, knowledge distillation and fine-tuning to boost single model, while the ensemble was used to combine single models. Our final submission has ranked first for the English to/from Hausa task.

2020

OPPO’s Machine Translation System for the IWSLT 2020 Open Domain Translation Task
Qian Zhang | Xiaopu Li | Dawei Dang | Tingxun Shi | Di Ai | Zhengshan Xue | Jie Hao
Proceedings of the 17th International Conference on Spoken Language Translation

In this paper, we demonstrate our machine translation system applied for the Chinese-Japanese bidirectional translation task (aka. open domain translation task) for the IWSLT 2020. Our model is based on Transformer (Vaswani et al., 2017), with the help of many popular, widely proved effective data preprocessing and augmentation methods. Experiments show that these methods can improve the baseline model steadily and significantly.

XLP at SemEval-2020 Task 9: Cross-lingual Models with Focal Loss for Sentiment Analysis of Code-Mixing Language
Yili Ma | Liang Zhao | Jie Hao
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we present an approach for sentiment analysis in code-mixed language on twitter defined in SemEval-2020 Task 9. Our team (referred as LiangZhao) employ different multilingual models with weighted loss focused on complexity of code-mixing in sentence, in which the best model achieved f1-score of 0.806 and ranked 1st of subtask- Sentimix Spanglish. The performance of method is analyzed and each component of our architecture is demonstrated.

OPPO’s Machine Translation Systems for WMT20
Tingxun Shi | Shiyu Zhao | Xiaopu Li | Xiaoxue Wang | Qian Zhang | Di Ai | Dawei Dang | Xue Zhengshan | Jie Hao
Proceedings of the Fifth Conference on Machine Translation

In this paper we demonstrate our (OPPO’s) machine translation systems for the WMT20 Shared Task on News Translation for all the 22 language pairs. We will give an overview of the common aspects across all the systems firstly, including two parts: the data preprocessing part will show how the data are preprocessed and filtered, and the system part will show our models architecture and the techniques we followed. Detailed information, such as training hyperparameters and the results generated by each technique will be depicted in the corresponding subsections. Our final submissions ranked top in 6 directions (English ↔ Czech, English ↔ Russian, French → German and Tamil → English), third in 2 directions (English → German, English → Japanese), and fourth in 2 directions (English → Pashto and and English → Tamil).

2019

OPPO NMT System for IWSLT 2019
Xiaopu Li | Zhengshan Xue | Jie Hao
Proceedings of the 16th International Conference on Spoken Language Translation

This paper illustrates the OPPO's submission for IWSLT2019 text translation task Our system is based on Transformer architecture. Besides, we also study the effect of model ensembling. On the devsets of IWSLT 2019, the BLEU of our system reaches 19.94.

Multi-Granularity Self-Attention for Neural Machine Translation
Jie Hao | Xing Wang | Shuming Shi | Jinfeng Zhang | Zhaopeng Tu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information. However, prior work on statistical machine translation has shown that extending the basic translation unit from words to phrases has produced substantial improvements, suggesting the possibility of improving NMT performance from explicit modeling of phrases. In this work, we present multi-granularity self-attention (Mg-Sa): a neural network that combines multi-head self-attention and phrase modeling. Specifically, we train several attention heads to attend to phrases in either n-gram or syntactic formalisms. Moreover, we exploit interactions among phrases to enhance the strength of structure modeling – a commonly-cited weakness of self-attention. Experimental results on WMT14 English-to-German and NIST Chinese-to-English translation tasks show the proposed approach consistently improves performance. Targeted linguistic analysis reveal that Mg-Sa indeed captures useful phrase information at various levels of granularities.

Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons
Jie Hao | Xing Wang | Shuming Shi | Jinfeng Zhang | Zhaopeng Tu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Recent studies have shown that a hybrid of self-attention networks (SANs) and recurrent neural networks RNNs outperforms both individual architectures, while not much is known about why the hybrid models work. With the belief that modeling hierarchical structure is an essential complementary between SANs and RNNs, we propose to further enhance the strength of hybrid models with an advanced variant of RNNs – Ordered Neurons LSTM (ON-LSTM), which introduces a syntax-oriented inductive bias to perform tree-like composition. Experimental results on the benchmark machine translation task show that the proposed approach outperforms both individual architectures and a standard hybrid model. Further analyses on targeted linguistic evaluation and logical inference tasks demonstrate that the proposed approach indeed benefits from a better modeling of hierarchical structure.

Modeling Recurrence for Transformer
Jie Hao | Xing Wang | Baosong Yang | Longyue Wang | Jinfeng Zhang | Zhaopeng Tu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recently, the Transformer model that is based solely on attention mechanisms, has advanced the state-of-the-art on various machine translation tasks. However, recent studies reveal that the lack of recurrence modeling hinders its further improvement of translation capacity. In response to this problem, we propose to directly model recurrence for Transformer with an additional recurrence encoder. In addition to the standard recurrent neural network, we introduce a novel attentive recurrent network to leverage the strengths of both attention models and recurrent networks. Experimental results on the widely-used WMT14 English⇒German and WMT17 Chinese⇒English translation tasks demonstrate the effectiveness of the proposed approach. Our studies also reveal that the proposed model benefits from a short-cut that bridges the source and target sequences with a single recurrent layer, which outperforms its deep counterpart.

2015

Well-Formed Dependency to String translation with BTG Grammar
Xiaoqing Li | Kun Wang | Dakun Zhang | Jie Hao
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2014

Query Lattice for Translation Retrieval
Meiping Dong | Yong Cheng | Yang Liu | Jia Xu | Maosong Sun | Tatsuya Izuha | Jie Hao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

Co-authors

Jinfeng Zhang 3

Emre Can Acikgoz 2

Dilek Hakkani-Tur 2

Joo Hyuk Jeon 2

Yang Liu (刘扬) 2

Zhengshan Xue 2

Kellen Gillespie 1

Daniel Hajialigol 1

Tatsuya Izuha 1

Alexander Hanbo Li 1

Yang Liu (刘洋) 1

Niranjan Uma Naresh 1

Pradeep Natarajan 1

Zeynab Raeesy 1

Mukund Rungta 1

Vishal Vasudevan 1

Dong Yu (于东) 1

Liang Zhao (赵亮) 1

Xue Zhengshan 1

Venues