2024
pdf
bib
abs
Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation
Tong Zhang
|
Chen Huang
|
Yang Deng
|
Hongru Liang
|
Jia Liu
|
Zujie Wen
|
Wenqiang Lei
|
Tat-Seng Chua
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users, for securing a mutual agreement that leans favorably towards the system’s objectives. This poses two main challenges for existing dialogue agents: 1) The inability to integrate user-specific characteristics into the strategic planning, and 2) The difficulty of training strategic planners that can be generalized to diverse users. To address these challenges, we propose TRIP to enhance the capability in tailored strategic planning, incorporating a user-aware strategic planning module and a population-based training paradigm. Through experiments on benchmark non-collaborative dialogue tasks, we demonstrate the effectiveness of TRIP in catering to diverse users.
pdf
bib
abs
STYLE: Improving Domain Transferability of Asking Clarification Questions in Large Language Model Powered Conversational Agents
Yue Chen
|
Chen Huang
|
Yang Deng
|
Wenqiang Lei
|
Dingnan Jin
|
Jia Liu
|
Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL 2024
Equipping a conversational search engine with strategies regarding when to ask clarification questions is becoming increasingly important across various domains. Attributing to the context understanding capability of LLMs and their access to domain-specific sources of knowledge, LLM-based clarification strategies feature rapid transfer to various domains in a post-hoc manner.However, they still struggle to deliver promising performance on unseen domains, struggling to achieve effective domain transferability.We take the first step to investigate this issue and existing methods tend to produce one-size-fits-all strategies across diverse domains, limiting their search effectiveness.In response, we introduce a novel method, called STYLE,to achieve effective domain transferability.Our experimental results indicate that STYLE bears strong domain transferability, resulting in an average search performance improvement of 10% on four unseen domains.
pdf
bib
abs
Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model
Chen Huang
|
Yang Deng
|
Wenqiang Lei
|
Jiancheng Lv
|
Ido Dagan
Findings of the Association for Computational Linguistics: EMNLP 2024
To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data), while the rest of the data is indiscriminately assigned to model annotation (i.e., triage-to-model data). This may lead to inefficiencies in budget allocation for annotations, as easy data that the model could accurately annotate may be unnecessarily assigned to the expert, and hard data may be misclassified by the model. As a result, the overall annotation quality may be compromised. To address this issue, we propose a selective annotation framework called SANT. It effectively takes advantage of both the triage-to-human and triage-to-model data through the proposed error-aware triage and bi-weighting mechanisms. As such, informative or hard data is assigned to the expert for annotation, while easy data is handled by the model. Experimental results show that SANT consistently outperforms other baselines, leading to higher-quality annotation through its proper allocation of data to both expert and model workers. We provide pioneering work on data annotation within budget constraints, establishing a landmark for future triage-based annotation studies.
pdf
bib
abs
Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations
Peixin Qin
|
Chen Huang
|
Yang Deng
|
Wenqiang Lei
|
Tat-Seng Chua
Findings of the Association for Computational Linguistics: EMNLP 2024
With the aid of large language models, current conversational recommender system (CRS) has gaining strong abilities to persuade users to accept recommended items. While these CRSs are highly persuasive, they can mislead users by incorporating incredible information in their explanations, ultimately damaging the long-term trust between users and the CRS. To address this, we propose a simple yet effective method, called PC-CRS, to enhance the credibility of CRS’s explanations during persuasion. It guides the explanation generation through our proposed credibility-aware persuasive strategies and then gradually refines explanations via post-hoc self-reflection. Experimental results demonstrate the efficacy of PC-CRS in promoting persuasive and credible explanations. Further analysis reveals the reason behind current methods producing incredible explanations and the potential of credible explanations to improve recommendation accuracy.
pdf
bib
abs
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
Yong Lin
|
Skyler Seto
|
Maartje Ter Hoeve
|
Katherine Metcalf
|
Barry-John Theobald
|
Xuan Wang
|
Yizhe Zhang
|
Chen Huang
|
Tong Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024
Reinforcement Learning from Human Feedback (RLHF) is an effective approach for aligning language models to human preferences. Central to RLHF is learning a reward function for scoring human preferences. Two main approaches for learning a reward model are 1) training an EXplicit Reward Model (EXRM) as in RLHF, and 2) using an implicit reward learned from preference data through methods such as Direct Preference Optimization (DPO). Prior work has shown that the implicit reward model of DPO (denoted as DPORM) can approximate an EXRM on the limit infinite samples. However, it is unclear how effective is DPORM in practice. DPORM’s effectiveness directly implies the optimality of learned policy of DPO and also has practical implication for more advanced alignment methods, such as iterative DPO. We compare the accuracy at distinguishing preferred and rejected answers using both DPORM and EXRM. Our findings indicate that even though DPORM can fit the training dataset, it generalizes less effective than EXRM, especially when the validation datasets contain distributional shifts. Across five out-of-distribution settings, DPORM has a mean drop in accuracy of 3% and a maximum drop of 7%. These findings highlight that DPORM has limited generalization ability and substantiates the integration of an explicit reward model in iterative DPO approaches.
pdf
bib
abs
ARAIDA: Analogical Reasoning-Augmented Interactive Data Annotation
Chen Huang
|
Yiping Jin
|
Ilija Ilievski
|
Wenqiang Lei
|
Jiancheng Lv
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Human annotation is a time-consuming task that requires a significant amount of effort. To address this issue, interactive data annotation utilizes an annotation model to provide suggestions for humans to approve or correct. However, annotation models trained with limited labeled data are prone to generating incorrect suggestions, leading to extra human correction effort. To tackle this challenge, we propose Araida, an analogical reasoning-based approach that enhances automatic annotation accuracy in the interactive data annotation setting and reduces the need for human corrections. Araida involves an error-aware integration strategy that dynamically coordinates an annotation model and a k-nearest neighbors (KNN) model, giving more importance to KNN’s predictions when predictions from the annotation model are deemed inaccurate. Empirical studies demonstrate that Araida is adaptable to different annotation tasks and models. On average, it reduces human correction labor by 11.02% compared to vanilla interactive data annotation methods.
pdf
bib
abs
CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models
Tong Zhang
|
Peixin Qin
|
Yang Deng
|
Chen Huang
|
Wenqiang Lei
|
Junhong Liu
|
Dingnan Jin
|
Hongru Liang
|
Tat-Seng Chua
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs.Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries, even enhanced by chain-of-thought (CoT) and few-shot prompting. These techniques may result in overconfidence in LLMs and yield only marginal enhancements in identifying ambiguity. Furthermore, current LLMs fall short in generating high-quality clarifying questions due to a lack of conflict resolution and inaccurate utilization of inherent knowledge.In this paper, CLAMBER presents a guidance and promotes further research on proactive and trustworthy LLMs.
2023
pdf
bib
abs
TRAVEL: Tag-Aware Conversational FAQ Retrieval via Reinforcement Learning
Yue Chen
|
Dingnan Jin
|
Chen Huang
|
Jia Liu
|
Wenqiang Lei
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Efficiently retrieving FAQ questions that match users’ intent is essential for online customer service. Existing methods aim to fully utilize the dynamic conversation context to enhance the semantic association between the user query and FAQ questions. However, the conversation context contains noise, e.g., users may click questions they don’t like, leading to inaccurate semantics modeling. To tackle this, we introduce tags of FAQ questions, which can help us eliminate irrelevant information. We later integrate them into a reinforcement learning framework and minimize the negative impact of irrelevant information in the dynamic conversation context. We experimentally demonstrate our efficiency and effectiveness on conversational FAQ retrieval compared to other baselines.
pdf
bib
abs
Reduce Human Labor On Evaluating Conversational Information Retrieval System: A Human-Machine Collaboration Approach
Chen Huang
|
Peixin Qin
|
Wenqiang Lei
|
Jiancheng Lv
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Evaluating conversational information retrieval (CIR) systems is a challenging task that requires a significant amount of human labor for annotation. It is imperative to invest significant effort into researching more labor-effective methods for evaluating CIR systems. To touch upon this challenge, we take the first step to involve active testing in CIR evaluation and propose a novel method, called HomCoE. It strategically selects a few data for human annotation, then calibrates the evaluation results to eliminate evaluation biases. As such, it makes an accurate evaluation of the CIR system at low human labor. We experimentally reveal that it consumes less than 1% of human labor and achieves a consistency rate of 95%-99% with human evaluation results. This emphasizes the superiority of our method over other baselines.
pdf
bib
abs
Towards Effective Automatic Debt Collection with Persona Awareness
Tong Zhang
|
Junhong Liu
|
Chen Huang
|
Jia Liu
|
Hongru Liang
|
Zujie Wen
|
Wenqiang Lei
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Understanding debtor personas is crucial for collectors to empathize with debtors and develop more effective collection strategies. In this paper, we take the first step towards comprehensively investigating the significance of debtor personas and present a successful commercial practice on automatic debt collection agents. Specifically, we organize the debtor personas into a taxonomy and construct a persona-aware conversation dataset. Building upon it, we implement a simple yet effective persona-aware agent called PAD. After two-month online testing, PAD increases the recovery rate by 3.31% and collects an additional ~100K RMB. Our commercial practice brings inspiration to the debt collection industry by providing an effective automatic solution.
2020
pdf
bib
abs
Neural-DINF: A Neural Network based Framework for Measuring Document Influence
Jie Tan
|
Changlin Yang
|
Ying Li
|
Siliang Tang
|
Chen Huang
|
Yueting Zhuang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Measuring the scholarly impact of a document without citations is an important and challenging problem. Existing approaches such as Document Influence Model (DIM) are based on dynamic topic models, which only consider the word frequency change. In this paper, we use both frequency changes and word semantic shifts to measure document influence by developing a neural network framework. Our model has three steps. Firstly, we train the word embeddings for different time periods. Subsequently, we propose an unsupervised method to align vectors for different time periods. Finally, we compute the influence value of documents. Our experimental results show that our model outperforms DIM.