Zhenglu Yang


2022

pdf bib
Multi-Party Empathetic Dialogue Generation: A New Task for Dialog Systems
Ling.Yu Zhu | Zhengkun Zhang | Jun Wang | Hongbin Wang | Haiying Wu | Zhenglu Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Empathetic dialogue assembles emotion understanding, feeling projection, and appropriate response generation. Existing work for empathetic dialogue generation concentrates on the two-party conversation scenario. Multi-party dialogues, however, are pervasive in reality. Furthermore, emotion and sensibility are typically confused; a refined empathy analysis is needed for comprehending fragile and nuanced human feelings. We address these issues by proposing a novel task called Multi-Party Empathetic Dialogue Generation in this study. Additionally, a Static-Dynamic model for Multi-Party Empathetic Dialogue Generation, SDMPED, is introduced as a baseline by exploring the static sensibility and dynamic emotion for the multi-party empathetic dialogue learning, the aspects that help SDMPED achieve the state-of-the-art performance.

pdf bib
Modeling Temporal-Modal Entity Graph for Procedural Multimodal Machine Comprehension
Huibin Zhang | Zhengkun Zhang | Yao Zhang | Jun Wang | Yufan Li | Ning Jiang | Xin Wei | Zhenglu Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Procedural Multimodal Documents (PMDs) organize textual instructions and corresponding images step by step. Comprehending PMDs and inducing their representations for the downstream reasoning tasks is designated as Procedural MultiModal Machine Comprehension (M3C). In this study, we approach Procedural M3C at a fine-grained level (compared with existing explorations at a document or sentence level), that is, entity. With delicate consideration, we model entity both in its temporal and cross-modal relation and propose a novel Temporal-Modal Entity Graph (TMEG). Specifically, graph structure is formulated to capture textual and visual entities and trace their temporal-modal evolution. In addition, a graph aggregation module is introduced to conduct graph encoding and reasoning. Comprehensive experiments across three Procedural M3C tasks are conducted on a traditional dataset RecipeQA and our new dataset CraftQA, which can better evaluate the generalization of TMEG.

pdf bib
Fact-Tree Reasoning for N-ary Question Answering over Knowledge Graphs
Yao Zhang | Peiyao Li | Hongru Liang | Adam Jatowt | Zhenglu Yang
Findings of the Association for Computational Linguistics: ACL 2022

Current Question Answering over Knowledge Graphs (KGQA) task mainly focuses on performing answer reasoning upon KGs with binary facts. However, it neglects the n-ary facts, which contain more than two entities. In this work, we highlight a more challenging but under-explored task: n-ary KGQA, i.e., answering n-ary facts questions upon n-ary KGs. Nevertheless, the multi-hop reasoning framework popular in binary KGQA task is not directly applicable on n-ary KGQA. We propose two feasible improvements: 1) upgrade the basic reasoning unit from entity or relation to fact, and 2) upgrade the reasoning structure from chain to tree. Therefore, we propose a novel fact-tree reasoning framework, FacTree, which integrates the above two upgrades. FacTree transforms the question into a fact tree and performs iterative fact reasoning on the fact tree to infer the correct answer. Experimental results on the n-ary KGQA dataset we constructed and two binary KGQA benchmarks demonstrate the effectiveness of FacTree compared with state-of-the-art methods.

2021

pdf bib
RepSum: Unsupervised Dialogue Summarization based on Replacement Strategy
Xiyan Fu | Yating Zhang | Tianyi Wang | Xiaozhong Liu | Changlong Sun | Zhenglu Yang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In the field of dialogue summarization, due to the lack of training data, it is often difficult for supervised summary generation methods to learn vital information from dialogue context with limited data. Several attempts on unsupervised summarization for text by leveraging semantic information solely or auto-encoder strategy (i.e., sentence compression), it however cannot be adapted to the dialogue scene due to the limited words in utterances and huge gap between the dialogue and its summary. In this study, we propose a novel unsupervised strategy to address this challenge, which roots from the hypothetical foundation that a superior summary approximates a replacement of the original dialogue, and they are roughly equivalent for auxiliary (self-supervised) tasks, e.g., dialogue generation. The proposed strategy RepSum is applied to generate both extractive and abstractive summary with the guidance of the followed nˆth utterance generation and classification tasks. Extensive experiments on various datasets demonstrate the superiority of the proposed model compared with the state-of-the-art methods.

pdf bib
MM-AVS: A Full-Scale Dataset for Multi-modal Summarization
Xiyan Fu | Jun Wang | Zhenglu Yang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Multimodal summarization becomes increasingly significant as it is the basis for question answering, Web search, and many other downstream tasks. However, its learning materials have been lacking a holistic organization by integrating resources from various modalities, thereby lagging behind the research progress of this field. In this study, we release a full-scale multimodal dataset comprehensively gathering documents, summaries, images, captions, videos, audios, transcripts, and titles in English from CNN and Daily Mail. To our best knowledge, this is the first collection that spans all modalities and nearly comprises all types of materials available in this community. In addition, we devise a baseline model based on the novel dataset, which employs a newly proposed Jump-Attention mechanism based on transcripts. The experimental results validate the important assistance role of the external information for multimodal summarization.

pdf bib
GMH: A General Multi-hop Reasoning Model for KG Completion
Yao Zhang | Hongru Liang | Adam Jatowt | Wenqiang Lei | Xin Wei | Ning Jiang | Zhenglu Yang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Knowledge graphs are essential for numerous downstream natural language processing applications, but are typically incomplete with many facts missing. This results in research efforts on multi-hop reasoning task, which can be formulated as a search process and current models typically perform short distance reasoning. However, the long-distance reasoning is also vital with the ability to connect the superficially unrelated entities. To the best of our knowledge, there lacks a general framework that approaches multi-hop reasoning in mixed long-short distance reasoning scenarios. We argue that there are two key issues for a general multi-hop reasoning model: i) where to go, and ii) when to stop. Therefore, we propose a general model which resolves the issues with three modules: 1) the local-global knowledge module to estimate the possible paths, 2) the differentiated action dropout module to explore a diverse set of paths, and 3) the adaptive stopping search module to avoid over searching. The comprehensive results on three datasets demonstrate the superiority of our model with significant improvements against baselines in both short and long distance reasoning scenarios.

2020

pdf bib
Curriculum Pre-training for End-to-End Speech Translation
Chengyi Wang | Yu Wu | Shujie Liu | Ming Zhou | Zhenglu Yang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

End-to-end speech translation poses a heavy burden on the encoder because it has to transcribe, understand, and learn cross-lingual semantics simultaneously. To obtain a powerful encoder, traditional methods pre-train it on ASR data to capture speech features. However, we argue that pre-training the encoder only through simple speech recognition is not enough, and high-level linguistic knowledge should be considered. Inspired by this, we propose a curriculum pre-training method that includes an elementary course for transcription learning and two advanced courses for understanding the utterance and mapping words in two languages. The difficulty of these courses is gradually increasing. Experiments show that our curriculum pre-training method leads to significant improvements on En-De and En-Fr speech translation benchmarks.

2019

pdf bib
Attention Optimization for Abstractive Document Summarization
Min Gui | Junfeng Tian | Rui Wang | Zhenglu Yang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Attention plays a key role in the improvement of sequence-to-sequence-based document summarization models. To obtain a powerful attention helping with reproducing the most salient information and avoiding repetitions, we augment the vanilla attention model from both local and global aspects. We propose attention refinement unit paired with local variance loss to impose supervision on the attention model at each decoding step, and we also propose a global variance loss to optimize the attention distributions of all decoding steps from the global perspective. The performances on CNN/Daily Mail dataset verify the effectiveness of our methods.

2018

pdf bib
JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features
Hongru Liang | Haozheng Wang | Jun Wang | Shaodi You | Zhe Sun | Jin-Mao Wei | Zhenglu Yang
Proceedings of the 27th International Conference on Computational Linguistics

Learning social media content is the basis of many real-world applications, including information retrieval and recommendation systems, among others. In contrast with previous works that focus mainly on single modal or bi-modal learning, we propose to learn social media content by fusing jointly textual, acoustic, and visual information (JTAV). Effective strategies are proposed to extract fine-grained features of each modality, that is, attBiGRU and DCRNN. We also introduce cross-modal fusion and attentive pooling techniques to integrate multi-modal information comprehensively. Extensive experimental evaluation conducted on real-world datasets demonstrate our proposed model outperforms the state-of-the-art approaches by a large margin.

pdf bib
A Multi-Attention based Neural Network with External Knowledge for Story Ending Predicting Task
Qian Li | Ziwei Li | Jin-Mao Wei | Yanhui Gu | Adam Jatowt | Zhenglu Yang
Proceedings of the 27th International Conference on Computational Linguistics

Enabling a mechanism to understand a temporal story and predict its ending is an interesting issue that has attracted considerable attention, as in case of the ROC Story Cloze Task (SCT). In this paper, we develop a multi-attention-based neural network (MANN) with well-designed optimizations, like Highway Network, and concatenated features with embedding representations into the hierarchical neural network model. Considering the particulars of the specific task, we thoughtfully extend MANN with external knowledge resources, exceeding state-of-the-art results obviously. Furthermore, we develop a thorough understanding of our model through a careful hand analysis on a subset of the stories. We identify what traits of MANN contribute to its outperformance and how external knowledge is obtained in such an ending prediction task.

2017

pdf bib
Variation Autoencoder Based Network Representation Learning for Classification
Hang Li | Haozheng Wang | Zhenglu Yang | Masato Odagaki
Proceedings of ACL 2017, Student Research Workshop

2016

pdf bib
A Fast Approach for Semantic Similar Short Texts Retrieval
Yanhui Gu | Zhenglu Yang | Junsheng Zhou | Weiguang Qu | Jinmao Wei | Xingtian Shi
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2009

pdf bib
Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web
Yulan Yan | Naoaki Okazaki | Yutaka Matsuo | Zhenglu Yang | Mitsuru Ishizuka
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP