Conversation- and Tree-Structure Losses for Dialogue Disentanglement
Tianda Li | Jia-Chen Gu | Zhen-Hua Ling | Quan Liu
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
When multiple conversations occur simultaneously, a listener must decide which conversation each utterance is part of in order to interpret and respond to it appropriately. This task is referred as dialogue disentanglement. A significant drawback of previous studies on disentanglement lies in that they only focus on pair-wise relationships between utterances while neglecting the conversation structure which is important for conversation structure modeling. In this paper, we propose a hierarchical model, named Dialogue BERT (DIALBERT), which integrates the local and global semantics in the context range by using BERT to encode each message-pair and using BiLSTM to aggregate the chronological context information into the output of BERT. In order to integrate the conversation structure information into the model, two types of loss of conversation-structure loss and tree-structure loss are designed. In this way, our model can implicitly learn and leverage the conversation structures without being restricted to the lack of explicit access to such structures during the inference stage. Experimental results on two large datasets show that our method outperforms previous methods by substantial margins, achieving great performance on dialogue disentanglement.
How to Select One Among All ? An Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
Tianda Li | Ahmad Rashid | Aref Jafari | Pranav Sharma | Ali Ghodsi | Mehdi Rezagholizadeh
Findings of the Association for Computational Linguistics: EMNLP 2021
Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge in a large neural network into a smaller one. Even though KD has shown promise on a wide range of Natural Language Processing (NLP) applications, little is understood about how one KD algorithm compares to another and whether these approaches can be complimentary to each other. In this work, we evaluate various KD algorithms on in-domain, out-of-domain and adversarial testing. We propose a framework to assess adversarial robustness of multiple KD algorithms. Moreover, we introduce a new KD algorithm, Combined-KD, which takes advantage of two promising approaches (better training scheme and more efficient data augmentation). Our extensive experimental results show that Combined-KD achieves state-of-the-art results on the GLUE benchmark, out-of-domain generalization, and adversarial robustness compared to competitive methods.
- Ahmad Rashid 1
- Aref Jafari 1
- Pranav Sharma 1
- Ali Ghodsi 1
- Mehdi Rezagholizadeh 1
- show all...