Jian Wang

Also published as:


2024

pdf bib
Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue
Jian Wang | Chak Tou Leong | Jiashuo Wang | Dongding Lin | Wenjie Li | Xiaoyong Wei
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Tuning language models for dialogue generation has been a prevalent paradigm for building capable dialogue agents. Yet, traditional tuning narrowly views dialogue generation as resembling other language generation tasks, ignoring the role disparities between two speakers and the multi-round interactive process that dialogues ought to be. Such a manner often leads to unsatisfactory chat consistency for the built agent. In this work, we emphasize the interactive, communicative nature of dialogue and argue that it is more feasible to model the speaker roles of agent and user separately, enabling the agent to adhere to its role consistently. With this in mind, we propose an efficient Multi-round Interactive Dialogue Tuning (Midi-Tuning) framework. It models the agent and user individually with two adapters built upon large language models. The adapters make use of respective utterances round by round in alternating order and they are tuned via a round-level memory caching mechanism. Extensive experiments demonstrate that, our framework performs superior to traditional fine-tuning and harbors the tremendous potential for improving dialogue consistency.

pdf bib
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
Haiquan Zhao | Lingyu Li | Shisong Chen | Shuqi Kong | Jiaan Wang | Kexin Huang | Tianle Gu | Yixu Wang | Jian Wang | Liang Dandan | Zhixu Li | Yan Teng | Yanghua Xiao | Yingchun Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. In detail, we first re-organize 2,801 role-playing cards from seven existing datasets to define the roles of the role-playing agent. Second, we train a specific role-playing model called ESC-Role which behaves more like a confused person than GPT-4. Third, through ESC-Role and organized role cards, we systematically conduct experiments using 14 LLMs as the ESC models, including general AI-assistant LLMs (e.g., ChatGPT) and ESC-oriented LLMs (e.g., ExTES-Llama). We conduct comprehensive human annotations on interactive multi-turn dialogues of different ESC models. The results show that ESC-oriented LLMs exhibit superior ESC abilities compared to general AI-assistant LLMs, but there is still a gap behind human performance. Moreover, to automate the scoring process for future ESC models, we developed ESC-RANK, which trained on the annotated data, achieving a scoring performance surpassing 35 points of GPT-4.

pdf bib
E2CL: Exploration-based Error Correction Learning for Embodied Agents
Hanlin Wang | Chak Tou Leong | Jian Wang | Wenjie Li
Findings of the Association for Computational Linguistics: EMNLP 2024

Language models are exhibiting increasing capability in knowledge utilization and reasoning. However, when applied as agents in embodied environments, they often suffer from misalignment between their intrinsic knowledge and environmental knowledge, leading to infeasible actions. Traditional environment alignment methods, such as supervised learning on expert trajectories and reinforcement learning, encounter limitations in covering environmental knowledge and achieving efficient convergence, respectively. Inspired by human learning, we propose Exploration-based Error Correction Learning (E2CL), a novel framework that leverages exploration-induced errors and environmental feedback to enhance environment alignment for embodied agents. E2CL incorporates teacher-guided and teacher-free explorations to gather environmental feedback and correct erroneous actions. The agent learns to provide feedback and self-correct, thereby enhancing its adaptability to target environments. Extensive experiments in the VirtualHome environment demonstrate that E2CL-trained agents outperform those trained by baseline methods and exhibit superior self-correction capabilities.

pdf bib
CoT-based Data Augmentation Strategy for Persuasion Techniques Detection
Dailin Li | Chuhan Wang | Xin Zou | Junlong Wang | Peng Chen | Jian Wang | Liang Yang | Hongfei Lin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Detecting persuasive communication is an important topic in Natural Language Processing (NLP), as it can be useful in identifying fake information on social media. We have developed a system to identify applied persuasion techniques in text fragments across four languages: English, Bulgarian, North Macedonian, and Arabic. Our system uses data augmentation methods and employs an ensemble strategy that combines the strengths of both RoBERTa and DeBERTa models. Due to limited resources, we concentrated solely on task 1, and our solution achieved the top ranking in the English track during the official assessments. We also analyse the impact of architectural decisions, data constructionand training strategies.

2023

pdf bib
Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation
Jian Wang | Yi Cheng | Dongding Lin | Chak Leong | Wenjie Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Target-oriented dialogue systems, designed to proactively steer conversations toward predefined targets or accomplish specific system-side goals, are an exciting area in conversational AI. In this work, by formulating a <dialogue act, topic> pair as the conversation target, we explore a novel problem of personalized target-oriented dialogue by considering personalization during the target accomplishment process. However, there remains an emergent need for high-quality datasets, and building one from scratch requires tremendous human effort. To address this, we propose an automatic dataset curation framework using a role-playing approach. Based on this framework, we construct a large-scale personalized target-oriented dialogue dataset, TopDial, which comprises about 18K multi-turn dialogues. The experimental results show that this dataset is of high quality and could contribute to exploring personalized target-oriented dialogue.

pdf bib
Self-Detoxifying Language Models via Toxification Reversal
Chak Tou Leong | Yi Cheng | Jiashuo Wang | Jian Wang | Wenjie Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Language model detoxification aims to minimize the risk of generating offensive or harmful content in pretrained language models (PLMs) for safer deployment. Existing methods can be roughly categorized as finetuning-based and decoding-based. However, the former is often resource-intensive, while the latter relies on additional components and potentially compromises the generation fluency. In this paper, we propose a more lightweight approach that enables the PLM itself to achieve “self-detoxification”. Our method is built upon the observation that prepending a negative steering prompt can effectively induce PLMs to generate toxic content. At the same time, we are inspired by the recent research in the interpretability field, which formulates the evolving contextualized representations within the PLM as an information stream facilitated by the attention layers. Drawing on this idea, we devise a method to identify the toxification direction from the normal generation process to the one prompted with the negative prefix, and then steer the generation to the reversed direction by manipulating the information movement within the attention layers. Experimental results show that our approach, without any fine-tuning or extra components, can achieve comparable performance with state-of-the-art methods.

pdf bib
Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue
Jian Wang | Dongding Lin | Wenjie Li
Findings of the Association for Computational Linguistics: ACL 2023

Goal-directed dialogue systems aim to proactively reach a pre-determined target through multi-turn conversations. The key to achieving this task lies in planning dialogue paths that smoothly and coherently direct conversations towards the target. However, this is a challenging and under-explored task. In this work, we propose a coherent dialogue planning approach that uses a stochastic process to model the temporal dynamics of dialogue paths. We define a latent space that captures the coherence of goal-directed behavior using a Brownian bridge process, which allows us to incorporate user feedback flexibly in dialogue planning. Based on the derived latent trajectories, we generate dialogue paths explicitly using pre-trained language models. We finally employ these paths as natural language prompts to guide dialogue generation. Our experiments show that our approach generates more coherent utterances and achieves the goal with a higher success rate.

pdf bib
Medical Dialogue Generation via Dual Flow Modeling
Kaishuai Xu | Wenjun Hou | Yi Cheng | Jian Wang | Wenjie Li
Findings of the Association for Computational Linguistics: ACL 2023

Medical dialogue systems (MDS) aim to provide patients with medical services, such as diagnosis and prescription. Since most patients cannot precisely describe their symptoms, dialogue understanding is challenging for MDS. Previous studies mainly addressed this by extracting the mentioned medical entities as critical dialogue history information. In this work, we argue that it is also essential to capture the transitions of the medical entities and the doctor’s dialogue acts in each turn, as they help the understanding of how the dialogue flows and enhance the prediction of the entities and dialogue acts to be adopted in the following turn. Correspondingly, we propose a Dual Flow enhanced Medical (DFMed) dialogue generation framework. It extracts the medical entities and dialogue acts used in the dialogue history and models their transitions with an entity-centric graph flow and a sequential act flow, respectively. We employ two sequential models to encode them and devise an interweaving component to enhance their interactions. Experiments on two datasets demonstrate that our method exceeds baselines in both automatic and manual evaluations.

2022

pdf bib
多特征融合的越英端到端语音翻译方法(A Vietnamese-English end-to-end speech translation method based on multi-feature fusion)
Houli Ma (马候丽) | Ling Dong (董凌) | Wenjun Wang (王文君) | Jian Wang (王剑) | Shengxiang Gao (高盛祥) | Zhengtao Yu (余正涛)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“语音翻译的编码器需要同时编码语音中的声学和语义信息,单一的Fbank或Wav2vec2语音特征表征能力存在不足。本文通过分析人工的Fbank特征与自监督的Wav2vec2特征间的差异性,提出基于交叉注意力机制的声学特征融合方法,并探究了不同的自监督特征和融合方式,加强模型对语音中声学和语义信息的学习。结合越南语语音特点,以Fbank特征为主、Pitch特征为辅混合编码Fbank表征,构建多特征融合的越-英语音翻译模型。实验表明,使用多特征的语音翻译模型相比单特征翻译效果更优,与简单的特征拼接方法相比更有效,所提的多特征融合方法在越-英语音翻译任务上提升了1.97个BLEU值。”

pdf bib
Two Languages Are Better than One: Bilingual Enhancement for Chinese Named Entity Recognition
Jinzhong Ning | Zhihao Yang | Zhizheng Wang | Yuanyuan Sun | Hongfei Lin | Jian Wang
Proceedings of the 29th International Conference on Computational Linguistics

Chinese Named Entity Recognition (NER) has continued to attract research attention. However, most existing studies only explore the internal features of the Chinese language but neglect other lingual modal features. Actually, as another modal knowledge of the Chinese language, English contains rich prompts about entities that can potentially be applied to improve the performance of Chinese NER. Therefore, in this study, we explore the bilingual enhancement for Chinese NER and propose a unified bilingual interaction module called the Adapted Cross-Transformers with Global Sparse Attention (ACT-S) to capture the interaction of bilingual information. We utilize a model built upon several different ACT-Ss to integrate the rich English information into the Chinese representation. Moreover, our model can learn the interaction of information between bilinguals (inter-features) and the dependency information within Chinese (intra-features). Compared with existing Chinese NER methods, our proposed model can better handle entities with complex structures. The English text that enhances the model is automatically generated by machine translation, avoiding high labour costs. Experimental results on four well-known benchmark datasets demonstrate the effectiveness and robustness of our proposed model.

pdf bib
RealMedDial: A Real Telemedical Dialogue Dataset Collected from Online Chinese Short-Video Clips
Bo Xu | Hongtong Zhang | Jian Wang | Xiaokun Zhang | Dezhi Hao | Linlin Zong | Hongfei Lin | Fenglong Ma
Proceedings of the 29th International Conference on Computational Linguistics

Intelligent medical services have attracted great research interests for providing automated medical consultation. However, the lack of corpora becomes a main obstacle to related research, particularly data from real scenarios. In this paper, we construct RealMedDial, a Chinese medical dialogue dataset based on real medical consultation. RealMedDial contains 2,637 medical dialogues and 24,255 utterances obtained from Chinese short-video clips of real medical consultations. We collected and annotated a wide range of meta-data with respect to medical dialogue including doctor profiles, hospital departments, diseases and symptoms for fine-grained analysis on language usage pattern and clinical diagnosis. We evaluate the performance of medical response generation, department routing and doctor recommendation on RealMedDial. Results show that RealMedDial are applicable to a wide range of NLP tasks with respect to medical dialogue.

pdf bib
Domain-specific knowledge distillation yields smaller and better models for conversational commerce
Kristen Howell | Jian Wang | Akshay Hazare | Joseph Bradley | Chris Brew | Xi Chen | Matthew Dunn | Beth Hockey | Andrew Maurer | Dominic Widdows
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)

We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.

2020

pdf bib
TransS-Driven Joint Learning Architecture for Implicit Discourse Relation Recognition
Ruifang He | Jian Wang | Fengyu Guo | Yugui Han
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Implicit discourse relation recognition is a challenging task due to the lack of connectives as strong linguistic clues. Previous methods primarily encode two arguments separately or extract the specific interaction patterns for the task, which have not fully exploited the annotated relation signal. Therefore, we propose a novel TransS-driven joint learning architecture to address the issues. Specifically, based on the multi-level encoder, we 1) translate discourse relations in low-dimensional embedding space (called TransS), which could mine the latent geometric structure information of argument-relation instances; 2) further exploit the semantic features of arguments to assist discourse understanding; 3) jointly learn 1) and 2) to mutually reinforce each other to obtain the better argument representations, so as to improve the performance of the task. Extensive experimental results on the Penn Discourse TreeBank (PDTB) show that our model achieves competitive results against several state-of-the-art systems.

pdf bib
Dual Dynamic Memory Network for End-to-End Multi-turn Task-oriented Dialog Systems
Jian Wang | Junhao Liu | Wei Bi | Xiaojiang Liu | Kejing He | Ruifeng Xu | Min Yang
Proceedings of the 28th International Conference on Computational Linguistics

Existing end-to-end task-oriented dialog systems struggle to dynamically model long dialog context for interactions and effectively incorporate knowledge base (KB) information into dialog generation. To conquer these limitations, we propose a Dual Dynamic Memory Network (DDMN) for multi-turn dialog generation, which maintains two core components: dialog memory manager and KB memory manager. The dialog memory manager dynamically expands the dialog memory turn by turn and keeps track of dialog history with an updating mechanism, which encourages the model to filter irrelevant dialog history and memorize important newly coming information. The KB memory manager shares the structural KB triples throughout the whole conversation, and dynamically extracts KB information with a memory pointer at each turn. Experimental results on three benchmark datasets demonstrate that DDMN significantly outperforms the strong baselines in terms of both automatic evaluation and human evaluation. Our code is available at https://github.com/siat-nlp/DDMN.

2018

pdf bib
WECA: A WordNet-Encoded Collocation-Attention Network for Homographic Pun Recognition
Yufeng Diao | Hongfei Lin | Di Wu | Liang Yang | Kan Xu | Zhihao Yang | Jian Wang | Shaowu Zhang | Bo Xu | Dongyu Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Homographic puns have a long history in human writing, widely used in written and spoken literature, which usually occur in a certain syntactic or stylistic structure. How to recognize homographic puns is an important research. However, homographic pun recognition does not solve very well in existing work. In this work, we first use WordNet to understand and expand word embedding for settling the polysemy of homographic puns, and then propose a WordNet-Encoded Collocation-Attention network model (WECA) which combined with the context weights for recognizing the puns. Our experiments on the SemEval2017 Task7 and Pun of the Day demonstrate that the proposed model is able to distinguish between homographic pun and non-homographic pun texts. We show the effectiveness of the model to present the capability of choosing qualitatively informative words. The results show that our model achieves the state-of-the-art performance on homographic puns recognition.

pdf bib
Think Visually: Question Answering through Virtual Imagery
Ankit Goyal | Jian Wang | Jia Deng
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we study the problem of geometric reasoning (a form of visual reasoning) in the context of question-answering. We introduce Dynamic Spatial Memory Network (DSMN), a new deep network architecture that specializes in answering questions that admit latent visual representations, and learns to generate and reason over such representations. Further, we propose two synthetic benchmarks, FloorPlanQA and ShapeIntersection, to evaluate the geometric reasoning capability of QA systems. Experimental results validate the effectiveness of our proposed DSMN for visual thinking tasks.

2017

pdf bib
Alibaba at IJCNLP-2017 Task 2: A Boosted Deep System for Dimensional Sentiment Analysis of Chinese Phrases
Xin Zhou | Jian Wang | Xu Xie | Changlong Sun | Luo Si
Proceedings of the IJCNLP 2017, Shared Tasks

This paper introduces Team Alibaba’s systems participating IJCNLP 2017 shared task No. 2 Dimensional Sentiment Analysis for Chinese Phrases (DSAP). The systems mainly utilize a multi-layer neural networks, with multiple features input such as word embedding, part-of-speech-tagging (POST), word clustering, prefix type, character embedding, cross sentiment input, and AdaBoost method for model training. For word level task our best run achieved MAE 0.545 (ranked 2nd), PCC 0.892 (ranked 2nd) in valence prediction and MAE 0.857 (ranked 1st), PCC 0.678 (ranked 2nd) in arousal prediction. For average performance of word and phrase task we achieved MAE 0.5355 (ranked 3rd), PCC 0.8965 (ranked 3rd) in valence prediction and MAE 0.661 (ranked 3rd), PCC 0.766 (ranked 2nd) in arousal prediction. In the final our submitted system achieved 2nd in mean rank.

2016

pdf bib
DUTIR in BioNLP-ST 2016: Utilizing Convolutional Network and Distributed Representation to Extract Complicate Relations
Honglei Li | Jianhai Zhang | Jian Wang | Hongfei Lin | Zhihao Yang
Proceedings of the 4th BioNLP Shared Task Workshop

2015

pdf bib
Biography-Dependent Collaborative Entity Archiving for Slot Filling
Yu Hong | Xiaobin Wang | Yadong Chen | Jian Wang | Tongtao Zhang | Heng Ji
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing