Tomoya Iwakura


pdf bib
On the (In)Effectiveness of Images for Text Classification
Chunpeng Ma | Aili Shen | Hiyori Yoshikawa | Tomoya Iwakura | Daniel Beck | Timothy Baldwin
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Images are core components of multi-modal learning in natural language processing (NLP), and results have varied substantially as to whether images improve NLP tasks or not. One confounding effect has been that previous NLP research has generally focused on sophisticated tasks (in varying settings), generally applied to English only. We focus on text classification, in the context of assigning named entity classes to a given Wikipedia page, where images generally complement the text and the Wikipedia page can be in one of a number of different languages. Our experiments across a range of languages show that images complement NLP models (including BERT) trained without external pre-training, but when combined with BERT models pre-trained on large-scale external data, images contribute nothing.


pdf bib
Transformer-based Approach for Predicting Chemical Compound Structures
Yutaro Omote | Kyoumoto Matsushita | Tomoya Iwakura | Akihiro Tamura | Takashi Ninomiya
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

By predicting chemical compound structures from their names, we can better comprehend chemical compounds written in text and identify the same chemical compound given different notations for database creation. Previous methods have predicted the chemical compound structures from their names and represented them by Simplified Molecular Input Line Entry System (SMILES) strings. However, these methods mainly apply handcrafted rules, and cannot predict the structures of chemical compound names not covered by the rules. Instead of handcrafted rules, we propose Transformer-based models that predict SMILES strings from chemical compound names. We improve the conventional Transformer-based model by introducing two features: (1) a loss function that constrains the number of atoms of each element in the structure, and (2) a multi-task learning approach that predicts both SMILES strings and InChI strings (another string representation of chemical compound structures). In evaluation experiments, our methods achieved higher F-measures than previous rule-based approaches (Open Parser for Systematic IUPAC Nomenclature and two commercially used products), and the conventional Transformer-based model. We release the dataset used in this paper as a benchmark for the future research.


pdf bib
Multi-Task Learning for Chemical Named Entity Recognition with Chemical Compound Paraphrasing
Taiki Watanabe | Akihiro Tamura | Takashi Ninomiya | Takuya Makino | Tomoya Iwakura
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose a method to improve named entity recognition (NER) for chemical compounds using multi-task learning by jointly training a chemical NER model and a chemical com- pound paraphrase model. Our method en- ables the long short-term memory (LSTM) of the NER model to capture chemical com- pound paraphrases by sharing the parameters of the LSTM and character embeddings be- tween the two models. The experimental re- sults on the BioCreative IV’s CHEMDNER task show that our method improves chemi- cal NER and achieves state-of-the-art perfor- mance.

pdf bib
Global Optimization under Length Constraint for Neural Text Summarization
Takuya Makino | Tomoya Iwakura | Hiroya Takamura | Manabu Okumura
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a global optimization method under length constraint (GOLC) for neural text summarization models. GOLC increases the probabilities of generating summaries that have high evaluation scores, ROUGE in this paper, within a desired length. We compared GOLC with two optimization methods, a maximum log-likelihood and a minimum risk training, on CNN/Daily Mail and a Japanese single document summarization data set of The Mainichi Shimbun Newspapers. The experimental results show that a state-of-the-art neural summarization model optimized with GOLC generates fewer overlength summaries while maintaining the fastest processing speed; only 6.70% overlength summaries on CNN/Daily and 7.8% on long summary of Mainichi, compared to the approximately 20% to 50% on CNN/Daily Mail and 10% to 30% on Mainichi with the other optimization methods. We also demonstrate the importance of the generation of in-length summaries for post-editing with the dataset Mainich that is created with strict length constraints. The ex- perimental results show approximately 30% to 40% improved post-editing time by use of in-length summaries.

pdf bib
A Fast and Accurate Partially Deterministic Morphological Analysis
Hajime Morita | Tomoya Iwakura
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper proposes a partially deterministic morphological analysis method for improved processing speed. Maximum matching is a fast deterministic method for morphological analysis. However, the method tends to decrease performance due to lack of consideration of contextual information. In order to use maximum matching safely, we propose the use of Context Independent Strings (CISs), which are strings that do not have ambiguity in terms of morphological analysis. Our method first identifies CISs in a sentence using maximum matching without contextual information, then analyzes the unprocessed part of the sentence using a bi-gram-based morphological analysis model. We evaluate the method on a Japanese morphological analysis task. The experimental results show a 30% reduction of running time while maintaining improved accuracy.


pdf bib
Detecting Heavy Rain Disaster from Social and Physical Sensor
Tomoya Iwakura | Seiji Okajima | Nobuyuki Igata | Kunihiro Takeda | Yuzuru Yamakage | Naoshi Morita
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We present our system that assists to detect heavy rain disaster, which is being used in real world in Japan. Our system selects tweets about heavy rain disaster with a document classifier. Then, the locations mentioned in the selected tweets are estimated by a location estimator. Finally, combined the selected tweets with amount of rainfall given by physical sensors and a statistical analysis, our system provides users with visualized results for detecting heavy rain disaster.

pdf bib
Model Transfer with Explicit Knowledge of the Relation between Class Definitions
Hiyori Yoshikawa | Tomoya Iwakura
Proceedings of the 22nd Conference on Computational Natural Language Learning

This paper investigates learning methods for multi-class classification using labeled data for the target classification scheme and another labeled data for a similar but different classification scheme (support scheme). We show that if we have prior knowledge about the relation between support and target classification schemes in the form of a class correspondence table, we can use it to improve the model performance further than the simple multi-task learning approach. Instead of learning the individual classification layers for the support and target schemes, the proposed method converts the class label of each example on the support scheme into a set of candidate class labels on the target scheme via the class correspondence table, and then uses the candidate labels to learn the classification layer for the target scheme. We evaluate the proposed method on two tasks in NLP. The experimental results show that our method effectively learns the target schemes especially for the classes that have a tight connection to certain support classes.

pdf bib
Chemical Compounds Knowledge Visualization with Natural Language Processing and Linked Data
Kazunari Tanaka | Tomoya Iwakura | Yusuke Koyanagi | Noriko Ikeda | Hiroyuki Shindo | Yuji Matsumoto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


pdf bib
An Eye-tracking Study of Named Entity Annotation
Takenobu Tokunaga | Hitoshi Nishikawa | Tomoya Iwakura
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Utilising effective features in machine learning-based natural language processing (NLP) is crucial in achieving good performance for a given NLP task. The paper describes a pilot study on the analysis of eye-tracking data during named entity (NE) annotation, aiming at obtaining insights into effective features for the NE recognition task. The eye gaze data were collected from 10 annotators and analysed regarding working time and fixation distribution. The results of the preliminary qualitative analysis showed that human annotators tend to look at broader contexts around the target NE than recent state-of-the-art automatic NE recognition systems and to use predicate argument relations to identify the NE categories.


pdf bib
Comparison of Annotating Methods for Named Entity Corpora
Kanako Komiya | Masaya Suzuki | Tomoya Iwakura | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib
Constructing a Japanese Basic Named Entity Corpus of Various Genres
Tomoya Iwakura | Kanako Komiya | Ryuichi Tachibana
Proceedings of the Sixth Named Entity Workshop

pdf bib
Big Community Data before World Wide Web Era
Tomoya Iwakura | Tetsuro Takahashi | Akihiro Ohtani | Kunio Matsui
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

This paper introduces the NIFTY-Serve corpus, a large data archive collected from Japanese discussion forums that operated via a Bulletin Board System (BBS) between 1987 and 2006. This corpus can be used in Artificial Intelligence researches such as Natural Language Processing, Community Analysis, and so on. The NIFTY-Serve corpus differs from data on WWW in three ways; (1) essentially spam- and duplication-free because of strict data collection procedures, (2) historic user-generated data before WWW, and (3) a complete data set because the service now shut down. We also introduce some examples of use of the corpus.


pdf bib
A Boosted Semi-Markov Perceptron
Tomoya Iwakura
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

pdf bib
A Boosting-based Algorithm for Classification of Semi-Structured Text using the Frequency of Substructures
Tomoya Iwakura
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013


pdf bib
A Named Entity Recognition Method based on Decomposition and Concatenation of Word Chunks
Tomoya Iwakura | Hiroya Takamura | Manabu Okumura
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
A Named Entity Recognition Method using Rules Acquired from Unlabeled Data
Tomoya Iwakura
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011


pdf bib
Fast Boosting-based Part-of-Speech Tagging and Text Chunking with Efficient Rule Representation for Sequential Labeling
Tomoya Iwakura
Proceedings of the International Conference RANLP-2009


pdf bib
A Fast Boosting-based Learner for Feature-Rich Tagging and Chunking
Tomoya Iwakura | Seishi Okamoto
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning


pdf bib
Text Simplification for Reading Assistance: A Project Note
Kentaro Inui | Atsushi Fujita | Tetsuro Takahashi | Ryu Iida | Tomoya Iwakura
Proceedings of the Second International Workshop on Paraphrasing