From Text Segmentation to Enhanced Representation Learning: A Novel Approach to Multi-Label Classification for Long Texts

Wang Zhang, Xin Wang, Qian Wang, Tao Deng, Xiaoru Wu


Abstract
Multi-label text classification (MLTC) is an important task in the field of natural language processing. Most existing models rely on high-quality text representations provided by pre-trained language models (PLMs). They hence face the challenge of input length limitation caused by PLMs, when dealing with long texts. In light of this, we introduce a comprehensive approach to multi-label long text classification. We propose a text segmentation algorithm, which guarantees to produce the optimal segmentation, to address the issue of input length limitation caused by PLMs. We incorporate external knowledge, labels’ co-occurrence relations, and attention mechanisms in representation learning to enhance both text and label representations. Our method’s effectiveness is validated through extensive experiments on various MLTC datasets, unraveling the intricate correlations between texts and labels.
Anthology ID:
2024.findings-emnlp.402
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6864–6873
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.402
DOI:
Bibkey:
Cite (ACL):
Wang Zhang, Xin Wang, Qian Wang, Tao Deng, and Xiaoru Wu. 2024. From Text Segmentation to Enhanced Representation Learning: A Novel Approach to Multi-Label Classification for Long Texts. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6864–6873, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
From Text Segmentation to Enhanced Representation Learning: A Novel Approach to Multi-Label Classification for Long Texts (Zhang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.402.pdf
Software:
 2024.findings-emnlp.402.software.zip
Data:
 2024.findings-emnlp.402.data.zip