Hongliang He


2024

pdf bib
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Hongliang He | Wenlin Yao | Kaixin Ma | Wenhao Yu | Yong Dai | Hongming Zhang | Zhenzhong Lan | Dong Yu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents. Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots, greatly limiting their applicability in real-world scenarios. To bridge this gap, we introduce WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites. Moreover, we establish a new benchmark by compiling real-world tasks from 15 popular websites and introduce an automatic evaluation protocol leveraging multimodal understanding abilities of GPT-4V to evaluate open-ended web agents. We show that WebVoyager achieves a 59.1% task success rate on our benchmark, significantly surpassing the performance of both GPT-4 (All Tools) and the WebVoyager (text-only) setups, underscoring the exceptional capability of WebVoyager. The proposed automatic evaluation metric achieves 85.3% agreement with human judgment, indicating its effectiveness in providing reliable and accurate assessments of web agents.

pdf bib
SMILE: Single-turn to Multi-turn Inclusive Language Expansion via ChatGPT for Mental Health Support
Huachuan Qiu | Hongliang He | Shuai Zhang | Anqi Li | Zhenzhong Lan
Findings of the Association for Computational Linguistics: EMNLP 2024

Developing specialized dialogue systems for mental health support requires multi-turn conversation data, which has recently garnered increasing attention. However, gathering and releasing large-scale, real-life multi-turn conversations that could facilitate advancements in mental health support presents challenges in data privacy protection and the time and cost involved in crowdsourcing. To address these challenges, we introduce SMILE, a single-turn to multi-turn inclusive language expansion technique that prompts ChatGPT to rewrite public single-turn dialogues into multi-turn ones. Our work begins by analyzing language transformation and validating the feasibility of our proposed method. We conduct a study on dialogue diversity, including lexical features, semantic features, and dialogue topics, demonstrating the effectiveness of our method. Further, we employ our method to generate a large-scale, lifelike, and diverse dialogue dataset named SMILECHAT, consisting of 55k dialogues. Finally, we utilize the collected corpus to develop a mental health chatbot, MeChat. To better assess the quality of SMILECHAT, we collect a small-scale real-life counseling dataset conducted by data anonymization. Both automatic and human evaluations demonstrate significant improvements in our dialogue system and confirm that SMILECHAT is high-quality. Code, data, and model are publicly available at https://github.com/qiuhuachuan/smile.

2023

pdf bib
Understanding Client Reactions in Online Mental Health Counseling
Anqi Li | Lizhi Ma | Yaling Mei | Hongliang He | Shuai Zhang | Huachuan Qiu | Zhenzhong Lan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Communication success relies heavily on reading participants’ reactions. Such feedback is especially important for mental health counselors, who must carefully consider the client’s progress and adjust their approach accordingly. However, previous NLP research on counseling has mainly focused on studying counselors’ intervention strategies rather than their clients’ reactions to the intervention. This work aims to fill this gap by developing a theoretically grounded annotation framework that encompasses counselors’ strategies and client reaction behaviors. The framework has been tested against a large-scale, high-quality text-based counseling dataset we collected over the past two years from an online welfare counseling platform. Our study show how clients react to counselors’ strategies, how such reactions affect the final counseling outcomes, and how counselors can adjust their strategies in response to these reactions. We also demonstrate that this study can help counselors automatically predict their clients’ states.

pdf bib
Uni-Encoder: A Fast and Accurate Response Selection Paradigm for Generation-Based Dialogue Systems
Chiyu Song | Hongliang He | Haofei Yu | Pengfei Fang | Leyang Cui | Zhenzhong Lan
Findings of the Association for Computational Linguistics: ACL 2023

Sample-and-rank is a key decoding strategy for modern generation-based dialogue systems. It helps achieve diverse and high-quality responses by selecting an answer from a small pool of generated candidates. The current state-of-the-art ranking methods mainly use an encoding paradigm called Cross-Encoder, which separately encodes each context-candidate pair and ranks the candidates according to their fitness scores. However, Cross-Encoder repeatedly encodes the same lengthy context for each candidate, resulting in high computational costs. Poly-Encoder addresses the above problems by reducing the interaction between context and candidates, but with a price of performance drop. In this work, we develop a new paradigm called Uni-Encoder, that keeps the full attention over each pair as in Cross-Encoder while only encoding the context once, as in Poly-Encoder. Uni-Encoder encodes all the candidates with the context in one forward pass. We use the same positional embedding for all candidates to ensure they are treated equally and design a new attention mechanism to avoid confusion. Our Uni-Encoder can simulate other ranking paradigms using different attention and response concatenation methods. Extensive experiments show that our proposed paradigm achieves new state-of-the-art results on four benchmark datasets with high computational efficiency. For instance, it improves R10@1 by 2.9% with an approximately 4X faster inference speed on the Ubuntu V2 dataset.