Improving Neural Language Processing with Named Entities

Kyoumoto Matsushita, Takuya Makino, Tomoya Iwakura


Abstract
Pretraining-based neural network models have demonstrated state-of-the-art (SOTA) performances on natural language processing (NLP) tasks. The most frequently used sentence representation for neural-based NLP methods is a sequence of subwords that is different from the sentence representation of non-neural methods that are created using basic NLP technologies, such as part-of-speech (POS) tagging, named entity (NE) recognition, and parsing. Most neural-based NLP models receive only vectors encoded from a sequence of subwords obtained from an input text. However, basic NLP information, such as POS tags, NEs, parsing results, etc, cannot be obtained explicitly from only the large unlabeled text used in pretraining-based models. This paper explores use of NEs on two Japanese tasks; document classification and headline generation using Transformer-based models, to reveal the effectiveness of basic NLP information. The experimental results with eight basic NEs and approximately 200 extended NEs show that NEs improve accuracy although a large pretraining-based model trained using 70 GB text data was used.
Anthology ID:
2021.ranlp-1.107
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
940–949
Language:
URL:
https://aclanthology.org/2021.ranlp-1.107
DOI:
Bibkey:
Cite (ACL):
Kyoumoto Matsushita, Takuya Makino, and Tomoya Iwakura. 2021. Improving Neural Language Processing with Named Entities. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 940–949, Held Online. INCOMA Ltd..
Cite (Informal):
Improving Neural Language Processing with Named Entities (Matsushita et al., RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.107.pdf