Quoc-An Nguyen

2025

pdf bib abs
Beyond the Scientific Document: A Citation-Aware Multi-Granular Summarization Approach with Heterogeneous Graphs
Quoc-An Nguyen | Xuan-Hung Le | Thi-Minh-Thu Vu | Hoang-Quynh Le
Findings of the Association for Computational Linguistics: EMNLP 2025

Scientific summarization remains a challenging task due to the complex characteristics of internal structure and its external relations to other documents. To address this, our proposed model constructs a heterogeneous graph to represent a document and its relevant external citations. This heterogeneous graph enables the model to exploit information across multiple granularities, ranging from fine-grained textual components to the global document structure, and from internal content to external citation context, which facilitates context-aware representations and effectively reduces redundancy. In addition, we develop an effective encoder based on a multi-granularity graph attention mechanism and the triplet loss objective to enhance representation learning performance. Experimental results across three different scenarios consistently demonstrate that our model outperforms existing approaches. Source code is available at: https://github.com/quocanuetcs/CiteHeteroSum.

2021

This paper describes a system developed to summarize multiple answers challenge in the MEDIQA 2021 shared task collocated with the BioNLP 2021 Workshop. We propose an extractive summarization architecture based on several scores and state-of-the-art techniques. We also present our novel prosper-thy-neighbour strategies to improve performance. Our model has been proven to be effective with the best ROUGE-1/ROUGE-L scores, being the shared task runner up by ROUGE-2 F1 score (over 13 participated teams).

This paper describes a system developed to summarize multiple answers challenge in the MEDIQA 2021 shared task collocated with the BioNLP 2021 Workshop. We present an abstractive summarization model based on BART, a denoising auto-encoder for pre-training sequence-to-sequence models. As focusing on the summarization of answers to consumer health questions, we propose a query-driven filtering phase to choose useful information from the input document automatically. Our approach achieves potential results, rank no.2 (evaluated on extractive references) and no.3 (evaluated on abstractive references) in the final evaluation.

Co-authors

Quang Thuy Ha 1

Xuan-Hung Le 1

Linh Nguyen Tran Ngoc 1

Venues

bionlp2
findings1

Fix author