Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie

Naoki Yoshinaga

doi:10.18653/v1/2023.acl-short.2

Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie

Abstract

Accurate neural models are much less efficient than non-neural models and are useless for processing billions of social media posts or handling user queries in real time with a limited budget. This study revisits the fastest pattern-based NLP methods to make them as accurate as possible, thus yielding a strikingly simple yet surprisingly accurate morphological analyzer for Japanese. The proposed method induces reliable patterns from a morphological dictionary and annotated data. Experimental results on two standard datasets confirm that the method exhibits comparable accuracy to learning-based baselines, while boasting a remarkable throughput of over 1,000,000 sentences per second on a single modern CPU. The source code is available at https://www.tkl.iis.u-tokyo.ac.jp/ynaga/jagger/

Anthology ID:: 2023.acl-short.2
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13–23
Language:
URL:: https://aclanthology.org/2023.acl-short.2/
DOI:: 10.18653/v1/2023.acl-short.2
Bibkey:
Cite (ACL):: Naoki Yoshinaga. 2023. Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 13–23, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie (Yoshinaga, ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-short.2.pdf
Video:: https://aclanthology.org/2023.acl-short.2.mp4

PDF Cite Search Video Fix data