Zhexin Zhang


2022

pdf bib
Selecting Stickers in Open-Domain Dialogue through Multitask Learning
Zhexin Zhang | Yeshuang Zhu | Zhengcong Fei | Jinchao Zhang | Jie Zhou
Findings of the Association for Computational Linguistics: ACL 2022

With the increasing popularity of online chatting, stickers are becoming important in our online communication. Selecting appropriate stickers in open-domain dialogue requires a comprehensive understanding of both dialogues and stickers, as well as the relationship between the two types of modalities. To tackle these challenges, we propose a multitask learning method comprised of three auxiliary tasks to enhance the understanding of dialogue history, emotion and semantic meaning of stickers. Extensive experiments conducted on a recent challenging dataset show that our model can better combine the multimodal information and achieve significantly higher accuracy over strong baselines. Ablation study further verifies the effectiveness of each auxiliary task. Our code is available at https://github.com/nonstopfor/Sticker-Selection.

pdf bib
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Zhexin Zhang | Jiale Cheng | Hao Sun | Jiawen Deng | Fei Mi | Yasheng Wang | Lifeng Shang | Minlie Huang
Findings of the Association for Computational Linguistics: EMNLP 2022

Large pretrained language models can easily produce toxic or biased content, which is prohibitive for practical use. In order to detect such toxic generations, existing methods rely on templates, real-world data extraction, crowdsourcing workers or automatic generation to construct adversarial contexts that are likely to induce toxic generations. However, what type of context is more likely to induce unsafe responses is still under-explored. In this paper, we identify that context toxicity and context category (e.g., profanity, insult, drugs, etc.) are two important factors to cause safety issues in response generation. Hence, we propose a method called reverse generation to construct adversarial contexts conditioned on a given response, with the flexibility to control category, toxicity level and inductivity of the generated contexts. Via reverse generation, we augment the existing BAD dataset and construct a new dataset BAD+ which contains more than 120K diverse and highly inductive contexts in 12 categories. We test three popular pretrained dialogue models (Blender, DialoGPT and Plato2) and find that BAD+ can largely expose their safety problems. Furthermore, we show that BAD+ can greatly enhance the safety of generation, and we reveal the key factors of safety improvement. Our code and dataset is available at https://github.com/thu-coai/Reverse_Generation.

pdf bib
Persona-Guided Planning for Controlling the Protagonist’s Persona in Story Generation
Zhexin Zhang | Jiaxin Wen | Jian Guan | Minlie Huang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Endowing the protagonist with a specific personality is essential for writing an engaging story. In this paper, we aim to control the protagonist’s persona in story generation, i.e., generating a story from a leading context and a persona description, where the protagonist should exhibit the specified personality through a coherent event sequence. Considering that personas are usually embodied implicitly and sparsely in stories, we propose a planning-based generation model named ConPer to explicitly model the relationship between personas and events. ConPer first plans events of the protagonist’s behavior which are motivated by the specified persona through predicting one target sentence, then plans the plot as a sequence of keywords with the guidance of the predicted persona-related events and commonsense knowledge, and finally generates the whole story. Both automatic and manual evaluation results demonstrate that ConPer outperforms state-of-the-art baselines for generating more coherent and persona-controllable stories. Our code is available at https://github.com/thu-coai/ConPer.

2021

pdf bib
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
Jian Guan | Zhexin Zhang | Zhuoer Feng | Zitao Liu | Wenbiao Ding | Xiaoxi Mao | Changjie Fan | Minlie Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Automatic metrics are essential for developing natural language generation (NLG) models, particularly for open-ended language generation tasks such as story generation. However, existing automatic metrics are observed to correlate poorly with human evaluation. The lack of standardized benchmark datasets makes it difficult to fully evaluate the capabilities of a metric and fairly compare different metrics. Therefore, we propose OpenMEVA, a benchmark for evaluating open-ended story generation metrics. OpenMEVA provides a comprehensive test suite to assess the capabilities of metrics, including (a) the correlation with human judgments, (b) the generalization to different model outputs and datasets, (c) the ability to judge story coherence, and (d) the robustness to perturbations. To this end, OpenMEVA includes both manually annotated stories and auto-constructed test examples. We evaluate existing metrics on OpenMEVA and observe that they have poor correlation with human judgments, fail to recognize discourse-level incoherence, and lack inferential knowledge (e.g., causal order between events), the generalization ability and robustness. Our study presents insights for developing NLG models and metrics in further research.