Text Graph Transformer for Document Classification

Haopeng Zhang, Jiawei Zhang


Abstract
Text classification is a fundamental problem in natural language processing. Recent studies applied graph neural network (GNN) techniques to capture global word co-occurrence in a corpus. However, previous works are not scalable to large-sized corpus and ignore the heterogeneity of the text graph. To address these problems, we introduce a novel Transformer based heterogeneous graph neural network, namely Text Graph Transformer (TG-Transformer). Our model learns effective node representations by capturing structure and heterogeneity from the text graph. We propose a mini-batch text graph sampling method that significantly reduces computing and memory costs to handle large-sized corpus. Extensive experiments have been conducted on several benchmark datasets, and the results demonstrate that TG-Transformer outperforms state-of-the-art approaches on text classification task.
Anthology ID:
2020.emnlp-main.668
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8322–8327
Language:
URL:
https://aclanthology.org/2020.emnlp-main.668
DOI:
10.18653/v1/2020.emnlp-main.668
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.668.pdf
Video:
 https://slideslive.com/38938916