Text Categorization by Learning Predominant Sense of Words as Auxiliary Task

Kazuya Shimura, Jiyi Li, Fumiyo Fukumoto


Abstract
Distributions of the senses of words are often highly skewed and give a strong influence of the domain of a document. This paper follows the assumption and presents a method for text categorization by leveraging the predominant sense of words depending on the domain, i.e., domain-specific senses. The key idea is that the features learned from predominant senses are possible to discriminate the domain of the document and thus improve the overall performance of text categorization. We propose multi-task learning framework based on the neural network model, transformer, which trains a model to simultaneously categorize documents and predicts a predominant sense for each word. The experimental results using four benchmark datasets show that our method is comparable to the state-of-the-art categorization approach, especially our model works well for categorization of multi-label documents.
Anthology ID:
P19-1105
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1109–1119
Language:
URL:
https://aclanthology.org/P19-1105
DOI:
10.18653/v1/P19-1105
Bibkey:
Cite (ACL):
Kazuya Shimura, Jiyi Li, and Fumiyo Fukumoto. 2019. Text Categorization by Learning Predominant Sense of Words as Auxiliary Task. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1109–1119, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Text Categorization by Learning Predominant Sense of Words as Auxiliary Task (Shimura et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1105.pdf
Video:
 https://aclanthology.org/P19-1105.mp4
Code
 ShimShim46/TRF_Multitask
Data
RCV1