2019
pdf
bib
abs
Label Embedding using Hierarchical Structure of Labels for Twitter Classification
Taro Miyazaki
|
Kiminobu Makino
|
Yuka Takei
|
Hiroki Okamoto
|
Jun Goto
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Twitter is used for various applications such as disaster monitoring and news material gathering. In these applications, each Tweet is classified into pre-defined classes. These classes have a semantic relationship with each other and can be classified into a hierarchical structure, which is regarded as important information. Label texts of pre-defined classes themselves also include important clues for classification. Therefore, we propose a method that can consider the hierarchical structure of labels and label texts themselves. We conducted evaluation over the Text REtrieval Conference (TREC) 2018 Incident Streams (IS) track dataset, and we found that our method outperformed the methods of the conference participants.
2018
pdf
bib
abs
Classification of Tweets about Reported Events using Neural Networks
Kiminobu Makino
|
Yuka Takei
|
Taro Miyazaki
|
Jun Goto
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
We developed a system that automatically extracts “Event-describing Tweets” which include incidents or accidents information for creating news reports. Event-describing Tweets can be classified into “Reported-event Tweets” and “New-information Tweets.” Reported-event Tweets cite news agencies or user generated content sites, and New-information Tweets are other Event-describing Tweets. A system is needed to classify them so that creators of factual TV programs can use them in their productions. Proposing this Tweet classification task is one of the contributions of this paper, because no prior papers have used the same task even though program creators and other events information collectors have to do it to extract required information from social networking sites. To classify Tweets in this task, this paper proposes a method to input and concatenate character and word sequences in Japanese Tweets by using convolutional neural networks. This proposed method is another contribution of this paper. For comparison, character or word input methods and other neural networks are also used. Results show that a system using the proposed method and architectures can classify Tweets with an F1 score of 88 %.
2017
pdf
bib
Extracting Important Tweets for News Writers using Recurrent Neural Network with Attention Mechanism and Multi-task Learning
Taro Miyazaki
|
Shin Toriumi
|
Yuka Takei
|
Ichiro Yamada
|
Jun Goto
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation
pdf
bib
Tweet Extraction for News Production Considering Unreality
Yuka Takei
|
Taro Miyazaki
|
Ichiro Yamada
|
Jun Goto
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation
2016
pdf
bib
abs
‘BonTen’ – Corpus Concordance System for ‘NINJAL Web Japanese Corpus’
Masayuki Asahara
|
Kazuya Kawahara
|
Yuya Takei
|
Hideto Masuoka
|
Yasuko Ohba
|
Yuki Torii
|
Toru Morii
|
Yuki Tanaka
|
Kikuo Maekawa
|
Sachi Kato
|
Hikari Konishi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named ‘BonTen’ which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure.