Utilizing Visual Forms of Japanese Characters for Neural Review Classification

Yota Toyama, Makoto Miwa, Yutaka Sasaki


Abstract
We propose a novel method that exploits visual information of ideograms and logograms in analyzing Japanese review documents. Our method first converts font images of Japanese characters into character embeddings using convolutional neural networks. It then constructs document embeddings from the character embeddings based on Hierarchical Attention Networks, which represent the documents based on attention mechanisms from a character level to a sentence level. The document embeddings are finally used to predict the labels of documents. Our method provides a way to exploit visual features of characters in languages with ideograms and logograms. In the experiments, our method achieved an accuracy comparable to a character embedding-based model while our method has much fewer parameters since it does not need to keep embeddings of thousands of characters.
Anthology ID:
I17-2064
Volume:
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Editors:
Greg Kondrak, Taro Watanabe
Venue:
IJCNLP
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
378–382
Language:
URL:
https://aclanthology.org/I17-2064
DOI:
Bibkey:
Cite (ACL):
Yota Toyama, Makoto Miwa, and Yutaka Sasaki. 2017. Utilizing Visual Forms of Japanese Characters for Neural Review Classification. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 378–382, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Utilizing Visual Forms of Japanese Characters for Neural Review Classification (Toyama et al., IJCNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/I17-2064.pdf