Khac-Hoai Nam Bui


2022

pdf bib
Multi Graph Neural Network for Extractive Long Document Summarization
Xuan-Dung Doan | Le-Minh Nguyen | Khac-Hoai Nam Bui
Proceedings of the 29th International Conference on Computational Linguistics

Heterogeneous Graph Neural Networks (HeterGNN) have been recently introduced as an emergent approach for extracting document summarization (EDS) by exploiting the cross-relations between words and sentences. However, applying HeterGNN for long documents is still an open research issue. One of the main majors is the lacking of inter-sentence connections. In this regard, this paper exploits how to apply HeterGNN for long documents by building a graph on sentence-level nodes (homogeneous graph) and combine with HeterGNN for capturing the semantic information in terms of both inter and intra-sentence connections. Experiments on two benchmark datasets of long documents such as PubMed and ArXiv show that our method is able to achieve state-of-the-art results in this research field.

pdf bib
HeterGraphLongSum: Heterogeneous Graph Neural Network with Passage Aggregation for Extractive Long Document Summarization
Tuan-Anh Phan | Ngoc-Dung Ngoc Nguyen | Khac-Hoai Nam Bui
Proceedings of the 29th International Conference on Computational Linguistics

Graph Neural Network (GNN)-based models have proven effective in various Natural Language Processing (NLP) tasks in recent years. Specifically, in the case of the Extractive Document Summarization (EDS) task, modeling documents under graph structure is able to analyze the complex relations between semantic units (e.g., word-to-word, word-to-sentence, sentence-to-sentence) and enrich sentence representations via valuable information from their neighbors. However, long-form document summarization using graph-based methods is still an open research issue. The main challenge is to represent long documents in a graph structure in an effective way. In this regard, this paper proposes a new heterogeneous graph neural network (HeterGNN) model to improve the performance of long document summarization (HeterGraphLongSum). Specifically, the main idea is to add the passage nodes into the heterogeneous graph structure of word and sentence nodes for enriching the final representation of sentences. In this regard, HeterGraphLongSum is designed with three types of semantic units such as word, sentence, and passage. Experiments on two benchmark datasets for long documents such as Pubmed and Arxiv indicate promising results of the proposed model for the extractive long document summarization problem. Especially, HeterGraphLongSum is able to achieve state-of-the-art performance without relying on any pre-trained language models (e.g., BERT). The source code is available for further exploitation on the Github.

pdf bib
Extractive Text Summarization with Latent Topics using Heterogeneous Graph Neural Network
Tuan Anh Phan | Ngoc Dung Nguyen | Khac-Hoai Nam Bui
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation