Chao Bian


pdf bib
Enhancing Dialogue Summarization with Topic-Aware Global- and Local- Level Centrality
Xinnian Liang | Shuangzhi Wu | Chenhao Cui | Jiaqi Bai | Chao Bian | Zhoujun Li
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Dialogue summarization aims to condense a given dialogue into a simple and focused summary text. Typically, both the roles’ viewpoints and conversational topics change in the dialogue stream. Thus how to effectively handle the shifting topics and select the most salient utterance becomes one of the major challenges of this task. In this paper, we propose a novel topic-aware Global-Local Centrality (GLC) model to help select the salient context from all sub-topics. The centralities are constructed in both global level and local level. The global one aims to identify vital sub-topics in the dialogue and the local one aims to select the most important context in each sub-topic. Specifically, the GLC collects sub-topic based on the utterance representations. And each utterance is aligned with one sub-topic. Based on the sub-topics, the GLC calculates global- and local-level centralities. Finally, we combine the two to guide the model to capture both salient context and sub-topics when generating summaries. Experimental results show that our model outperforms strong baselines on three public dialogue summarization datasets: CSDS, MC, and SAMSUM. Further analysis demonstrates that our GLC can exactly identify vital contents from sub-topics.

pdf bib
DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models
Xinwei Wu | Junzhuo Li | Minghui Xu | Weilong Dong | Shuangzhi Wu | Chao Bian | Deyi Xiong
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Pretrained language models have learned a vast amount of human knowledge from large-scale corpora, but their powerful memorization capability also brings the risk of data leakage. Some risks may only be discovered after the model training is completed, such as the model memorizing a specific phone number and frequently outputting it. In such cases, model developers need to eliminate specific data influences from the model to mitigate legal and ethical penalties. To effectively mitigate these risks, people often have to spend a significant amount of time and computational costs to retrain new models instead of finding ways to cure the ‘sick’ models. Therefore, we propose a method to locate and erase risky neurons in order to eliminate the impact of privacy data in the model. We use a new method based on integrated gradients to locate neurons associated with privacy texts, and then erase these neurons by setting their activation values to zero.Furthermore, we propose a risky neuron aggregation method to eliminate the influence of privacy data in the model in batches. Experimental results show that our method can effectively and quickly eliminate the impact of privacy data without affecting the model’s performance. Additionally, we demonstrate the relationship between model memorization and neurons through experiments, further illustrating the robustness of our method.


pdf bib
Towards Modeling Role-Aware Centrality for Dialogue Summarization
Xinnian Liang | Chao Bian | Shuangzhi Wu | Zhoujun Li
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Role-oriented dialogue summarization generates summaries for different roles in dialogue (e.g. doctor and patient). Existing methods consider roles separately where interactions among different roles are not fully explored. In this paper, we propose a novel Role-Aware Centrality (RAC) model to capture role interactions, which can be easily applied to any seq2seq models. The RAC assigns each role a specific sentence-level centrality score by involving role prompts to control what kind of summary to generate. The RAC measures both the importance of utterances and the relevance between roles and utterances. Then we use RAC to re-weight context representations, which are used by the decoder to generate role summaries. We verify RAC on two public benchmark datasets, CSDS and MC. Experimental results show that the proposed method achieves new state-of-the-art results on the two datasets. Extensive analyses have demonstrated that the role-aware centrality helps generate summaries more precisely.


pdf bib
Tencent Neural Machine Translation Systems for WMT18
Mingxuan Wang | Li Gong | Wenhuan Zhu | Jun Xie | Chao Bian
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We participated in the WMT 2018 shared news translation task on English↔Chinese language pair. Our systems are based on attentional sequence-to-sequence models with some form of recursion and self-attention. Some data augmentation methods are also introduced to improve the translation performance. The best translation result is obtained with ensemble and reranking techniques. Our Chinese→English system achieved the highest cased BLEU score among all 16 submitted systems, and our English→Chinese system ranked the third out of 18 submitted systems.

pdf bib
Neural Machine Translation with Decoding History Enhanced Attention
Mingxuan Wang | Jun Xie | Zhixing Tan | Jinsong Su | Deyi Xiong | Chao Bian
Proceedings of the 27th International Conference on Computational Linguistics

Neural machine translation with source-side attention have achieved remarkable performance. however, there has been little work exploring to attend to the target-side which can potentially enhance the memory capbility of NMT. We reformulate a Decoding History Enhanced Attention mechanism (DHEA) to render NMT model better at selecting both source-side and target-side information. DHA enables dynamic control of the ratios at which source and target contexts contribute to the generation of target words, offering a way to weakly induce structure relations among both source and target tokens. It also allows training errors to be directly back-propagated through short-cut connections and effectively alleviates the gradient vanishing problem. The empirical study on Chinese-English translation shows that our model with proper configuration can improve by 0:9 BLEU upon Transformer and the best reported results in the dataset. On WMT14 English-German task and a larger WMT14 English-French task, our model achieves comparable results with the state-of-the-art.