Tomoko Ohkuma - ACL Anthology

Tomoko Ohkuma

2025

Can LLMs Learn from Their Mistakes? Self-Correcting Instruction Tuning for Named Entity Recognition
Takumi Takahashi | Tomoki Taniguchi | Chencheng Zhu | Tomoko Ohkuma
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Recent instruction-tuned large language models (LLMs) have demonstrated remarkable performance on various downstream tasks, including named entity recognition (NER). However, previous approaches often generate incorrect predictions, particularly regarding entity boundaries and types. Many of these errors can be corrected to match the ground truth by revising the entity boundaries and/or types. In this paper, we propose a self-correcting instruction tuning approach that simultaneously learns to perform NER and correct errors through natural language instructions. Self-correcting instruction tuning requires only a standard annotated NER dataset. Supervision for self-correction can be automatically generated from error patterns observed in LLMs fine-tuned solely on NER tasks. We conducted extensive experiments on eight NER datasets with two LLMs to validate the effectiveness of the proposed approach. The results demonstrate that the proposed approach enhances NER performance by effectively correcting prediction errors and substantially reducing false positives. We further analyze the self-correction behavior to better understand how the models improve performance.

Overview of PBIG Shared Task at AgentScen 2025: Product Business Idea Generation from Patents
Wataru Hirota | Chung-Chi Chen | Tomoko Ohkuma | Tomoki Taniguchi | Tatsuya Ishigaki
Proceedings of the 2nd Workshop on Agent AI for Scenario Planning

2022

Factual Accuracy is not Enough: Planning Consistent Description Order for Radiology Report Generation
Toru Nishino | Yasuhide Miura | Tomoki Taniguchi | Tomoko Ohkuma | Yuki Suzuki | Shoji Kido | Noriyuki Tomiyama
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Radiology report generation systems have the potential to reduce the workload of radiologists by automatically describing the findings in medical images.To broaden the application of the report generation system, the system should generate reports that are not only factually accurate but also chronologically consistent, describing images that are presented in time order, that is, the correct order.We employ a planning-based radiology report generation system that generates the overall structure of reports as “plans’” prior to generating reports that are accurate and consistent in order.Additionally, we propose a novel reinforcement learning and inference method, Coordinated Planning (CoPlan), that includes a content planner and a text generator to train and infer in a coordinated manner to alleviate the cascading of errors that are often inherent in planning-based models.We conducted experiments with single-phase diagnostic reports in which the factual accuracy is critical and multi-phase diagnostic reports in which the description order is critical.Our proposed CoPlan improves the content order score by 5.1 pt in time series critical scenarios and the clinical factual accuracy F-score by 9.1 pt in time series irrelevant scenarios, compared those of the baseline models without CoPlan.

2021

Quantifying Appropriateness of Summarization Data for Curriculum Learning
Ryuji Kano | Takumi Takahashi | Toru Nishino | Motoki Taniguchi | Tomoki Taniguchi | Tomoko Ohkuma
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Much research has reported the training data of summarization models are noisy; summaries often do not reflect what is written in the source texts. We propose an effective method of curriculum learning to train summarization models from such noisy data. Curriculum learning is used to train sequence-to-sequence models with noisy data. In translation tasks, previous research quantified noise of the training data using two models trained with noisy and clean corpora. Because such corpora do not exist in summarization fields, we propose a model that can quantify noise from a single noisy corpus. We conduct experiments on three summarization models; one pretrained model and two non-pretrained models, and verify our method improves the performance. Furthermore, we analyze how different curricula affect the performance of pretrained and non-pretrained summarization models. Our result on human evaluation also shows our method improves the performance of summarization models.

2020

Reinforcement Learning with Imbalanced Dataset for Data-to-Text Medical Report Generation
Toru Nishino | Ryota Ozaki | Yohei Momoki | Tomoki Taniguchi | Ryuji Kano | Norihisa Nakano | Yuki Tagawa | Motoki Taniguchi | Tomoko Ohkuma | Keigo Nakamura
Findings of the Association for Computational Linguistics: EMNLP 2020

Automated generation of medical reports that describe the findings in the medical images helps radiologists by alleviating their workload. Medical report generation system should generate correct and concise reports. However, data imbalance makes it difficult to train models accurately. Medical datasets are commonly imbalanced in their finding labels because incidence rates differ among diseases; moreover, the ratios of abnormalities to normalities are significantly imbalanced. We propose a novel reinforcement learning method with a reconstructor to improve the clinical correctness of generated reports to train the data-to-text module with a highly imbalanced dataset. Moreover, we introduce a novel data augmentation strategy for reinforcement learning to additionally train the model on infrequent findings. From the perspective of a practical use, we employ a Two-Stage Medical Report Generator (TS-MRGen) for controllable report generation from input images. TS-MRGen consists of two separated stages: an image diagnosis module and a data-to-text module. Radiologists can modify the image diagnosis module results to control the reports that the data-to-text module generates. We conduct an experiment with two medical datasets to assess the data-to-text module and the entire two-stage model. Results demonstrate that the reports generated by our model describe the findings in the input image more correctly.

Distinctive Slogan Generation with Reconstruction
Shotaro Misawa | Yasuhide Miura | Tomoki Taniguchi | Tomoko Ohkuma
Proceedings of Workshop on Natural Language Processing in E-Commerce

E-commerce sites include advertising slogans along with information regarding an item. Slogans can attract viewers’ attention to increase sales or visits by emphasizing advantages of an item. The aim of this study is to generate a slogan from a description of an item. To generate a slogan, we apply an encoder–decoder model which has shown effectiveness in many kinds of natural language generation tasks, such as abstractive summarization. However, slogan generation task has three characteristics that distinguish it from other natural language generation tasks: distinctiveness, topic emphasis, and style difference. To handle these three characteristics, we propose a compressed representation–based reconstruction model with refer–attention and conversion layers. The results of the experiments indicate that, based on automatic and human evaluation, our method achieves higher performance than conventional methods.

Identifying Implicit Quotes for Unsupervised Extractive Summarization of Conversations
Ryuji Kano | Yasuhide Miura | Tomoki Taniguchi | Tomoko Ohkuma
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

We propose Implicit Quote Extractor, an end-to-end unsupervised extractive neural summarization model for conversational texts. When we reply to posts, quotes are used to highlight important part of texts. We aim to extract quoted sentences as summaries. Most replies do not explicitly include quotes, so it is difficult to use quotes as supervision. However, even if it is not explicitly shown, replies always refer to certain parts of texts; we call them implicit quotes. Implicit Quote Extractor aims to extract implicit quotes as summaries. The training task of the model is to predict whether a reply candidate is a true reply to a post. For prediction, the model has to choose a few sentences from the post. To predict accurately, the model learns to extract sentences that replies frequently refer to. We evaluate our model on two email datasets and one social media dataset, and confirm that our model is useful for extractive summarization. We further discuss two topics; one is whether quote extraction is an important factor for summarization, and the other is whether our model can capture salient sentences that conventional methods cannot.

Aspect-Similarity-Aware Historical Influence Modeling for Rating Prediction
Ryo Shimura | Shotaro Misawa | Masahiro Sato | Tomoki Taniguchi | Tomoko Ohkuma
Proceedings of Workshop on Natural Language Processing in E-Commerce

Many e-commerce services provide customer review systems. Previous laboratory studies have indicated that the ratings recorded by these systems differ from the actual evaluations of the users, owing to the influence of historical ratings in the system. Some studies have proposed using real-world datasets to model rating prediction. Herein, we propose an aspect-similarity-aware historical influence model for rating prediction using natural language processing techniques. In general, each user provides a rating considering different aspects. Thus, it can be assumed that historical ratings provided considering similar aspects to those of later ones will influence evaluations of users more. By focusing on the review-topic similarities, we show that our method predicts ratings more accurately than the previous historical-inference-aware model. In addition, we examine whether our model can predict “intrinsic rating,” which is given if users were not influenced by historical ratings. We performed an intrinsic rating prediction task, and showed that our model achieved improved performance. Our method can be useful to debias user ratings collected by customer review systems. The debiased ratings help users to make decision properly and systems to provide helpful recommendations. This might improve the user experience of e-commerce services.

A Large-Scale Corpus of E-mail Conversations with Standard and Two-Level Dialogue Act Annotations
Motoki Taniguchi | Yoshihiro Ueda | Tomoki Taniguchi | Tomoko Ohkuma
Proceedings of the 28th International Conference on Computational Linguistics

We present a large-scale corpus of e-mail conversations with domain-agnostic and two-level dialogue act (DA) annotations towards the goal of a better understanding of asynchronous conversations. We annotate over 6,000 messages and 35,000 sentences from more than 2,000 threads. For a domain-independent and application-independent DA annotations, we choose ISO standard 24617-2 as the annotation scheme. To assess the difficulty of DA recognition on our corpus, we evaluate several models, including a pre-trained contextual representation model, as our baselines. The experimental results show that BERT outperforms other neural network models, including previous state-of-the-art models, but falls short of a human performance. We also demonstrate that DA tags of two-level granularity enable a DA recognition model to learn efficiently by using multi-task learning. An evaluation of a model trained on our corpus against other domains of asynchronous conversation reveals the domain independence of our DA annotations.

2019

Relation Prediction for Unseen-Entities Using Entity-Word Graphs
Yuki Tagawa | Motoki Taniguchi | Yasuhide Miura | Tomoki Taniguchi | Tomoko Ohkuma | Takayuki Yamamoto | Keiichi Nemoto
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

Knowledge graphs (KGs) are generally used for various NLP tasks. However, as KGs still miss some information, it is necessary to develop Knowledge Graph Completion (KGC) methods. Most KGC researches do not focus on the Out-of-KGs entities (Unseen-entities), we need a method that can predict the relation for the entity pairs containing Unseen-entities to automatically add new entities to the KGs. In this study, we focus on relation prediction and propose a method to learn entity representations via a graph structure that uses Seen-entities, Unseen-entities and words as nodes created from the descriptions of all entities. In the experiments, our method shows a significant improvement in the relation prediction for the entity pairs containing Unseen-entities.

Keeping Consistency of Sentence Generation and Document Classification with Multi-Task Learning
Toru Nishino | Shotaro Misawa | Ryuji Kano | Tomoki Taniguchi | Yasuhide Miura | Tomoko Ohkuma
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The automated generation of information indicating the characteristics of articles such as headlines, key phrases, summaries and categories helps writers to alleviate their workload. Previous research has tackled these tasks using neural abstractive summarization and classification methods. However, the outputs may be inconsistent if they are generated individually. The purpose of our study is to generate multiple outputs consistently. We introduce a multi-task learning model with a shared encoder and multiple decoders for each task. We propose a novel loss function called hierarchical consistency loss to maintain consistency among the attention weights of the decoders. To evaluate the consistency, we employ a human evaluation. The results show that our model generates more consistent headlines, key phrases and categories. In addition, our model outperforms the baseline model on the ROUGE scores, and generates more adequate and fluent headlines.

CLER: Cross-task Learning with Expert Representation to Generalize Reading and Understanding
Takumi Takahashi | Motoki Taniguchi | Tomoki Taniguchi | Tomoko Ohkuma
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

This paper describes our model for the reading comprehension task of the MRQA shared task. We propose CLER, which stands for Cross-task Learning with Expert Representation for the generalization of reading and understanding. To generalize its capabilities, the proposed model is composed of three key ideas: multi-task learning, mixture of experts, and ensemble. In-domain datasets are used to train and validate our model, and other out-of-domain datasets are used to validate the generalization of our model’s performances. In a submission run result, the proposed model achieved an average F1 score of 66.1 % in the out-of-domain setting, which is a 4.3 percentage point improvement over the official BERT baseline model.

2018

Harnessing Popularity in Social Media for Extractive Summarization of Online Conversations
Ryuji Kano | Yasuhide Miura | Motoki Taniguchi | Yan-Ying Chen | Francine Chen | Tomoko Ohkuma
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We leverage a popularity measure in social media as a distant label for extractive summarization of online conversations. In social media, users can vote, share, or bookmark a post they prefer. The number of these actions is regarded as a measure of popularity. However, popularity is not determined solely by content of a post, e.g., a text or an image it contains, but is highly based on its contexts, e.g., timing, and authority. We propose Disjunctive model that computes the contribution of content and context separately. For evaluation, we build a dataset where the informativeness of comments is annotated. We evaluate the results with ranking metrics, and show that our model outperforms the baseline models which directly use popularity as a measure of informativeness.

Joint Modeling for Query Expansion and Information Extraction with Reinforcement Learning
Motoki Taniguchi | Yasuhide Miura | Tomoko Ohkuma
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

Information extraction about an event can be improved by incorporating external evidence. In this study, we propose a joint model for pseudo-relevance feedback based query expansion and information extraction with reinforcement learning. Our model generates an event-specific query to effectively retrieve documents relevant to the event. We demonstrate that our model is comparable or has better performance than the previous model in two publicly available datasets. Furthermore, we analyzed the influences of the retrieval effectiveness in our model on the extraction performance.

Integrating Tree Structures and Graph Structures with Neural Networks to Classify Discussion Discourse Acts
Yasuhide Miura | Ryuji Kano | Motoki Taniguchi | Tomoki Taniguchi | Shotaro Misawa | Tomoko Ohkuma
Proceedings of the 27th International Conference on Computational Linguistics

We proposed a model that integrates discussion structures with neural networks to classify discourse acts. Several attempts have been made in earlier works to analyze texts that are used in various discussions. The importance of discussion structures has been explored in those works but their methods required a sophisticated design to combine structural features with a classifier. Our model introduces tree learning approaches and a graph learning approach to directly capture discussion structures without structural features. In an evaluation to classify discussion discourse acts in Reddit, the model achieved improvements of 1.5% in accuracy and 2.2 in FB1 score compared to the previous best model. We further analyzed the model using an attention mechanism to inspect interactions among different learning approaches.

Integrating Entity Linking and Evidence Ranking for Fact Extraction and Verification
Motoki Taniguchi | Tomoki Taniguchi | Takumi Takahashi | Yasuhide Miura | Tomoko Ohkuma
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

We describe here our system and results on the FEVER shared task. We prepared a pipeline system which composes of a document selection, a sentence retrieval, and a recognizing textual entailment (RTE) components. A simple entity linking approach with text match is used as the document selection component, this component identifies relevant documents for a given claim by using mentioned entities as clues. The sentence retrieval component selects relevant sentences as candidate evidence from the documents based on TF-IDF. Finally, the RTE component selects evidence sentences by ranking the sentences and classifies the claim simultaneously. The experimental results show that our system achieved the FEVER score of 0.4016 and outperformed the official baseline system.

2017

Character-based Bidirectional LSTM-CRF with words and characters for Japanese Named Entity Recognition
Shotaro Misawa | Motoki Taniguchi | Yasuhide Miura | Tomoko Ohkuma
Proceedings of the First Workshop on Subword and Character Level Models in NLP

Recently, neural models have shown superior performance over conventional models in NER tasks. These models use CNN to extract sub-word information along with RNN to predict a tag for each word. However, these models have been tested almost entirely on English texts. It remains unclear whether they perform similarly in other languages. We worked on Japanese NER using neural models and discovered two obstacles of the state-of-the-art model. First, CNN is unsuitable for extracting Japanese sub-word information. Secondly, a model predicting a tag for each word cannot extract an entity when a part of a word composes an entity. The contributions of this work are (1) verifying the effectiveness of the state-of-the-art NER model for Japanese, (2) proposing a neural model for predicting a tag for each character using word and character information. Experimentally obtained results demonstrate that our model outperforms the state-of-the-art neural English NER model in Japanese.

Unifying Text, Metadata, and User Network Representations with a Neural Network for Geolocation Prediction
Yasuhide Miura | Motoki Taniguchi | Tomoki Taniguchi | Tomoko Ohkuma
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a novel geolocation prediction model using a complex neural network. Geolocation prediction in social media has attracted many researchers to use information of various types. Our model unifies text, metadata, and user network representations with an attention mechanism to overcome previous ensemble approaches. In an evaluation using two open datasets, the proposed model exhibited a maximum 3.8% increase in accuracy and a maximum of 6.6% increase in accuracy@161 against previous models. We further analyzed several intermediate layers of our model, which revealed that their states capture some statistical characteristics of the datasets.

Using Social Networks to Improve Language Variety Identification with Neural Networks
Yasuhide Miura | Tomoki Taniguchi | Motoki Taniguchi | Shotaro Misawa | Tomoko Ohkuma
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We propose a hierarchical neural network model for language variety identification that integrates information from a social network. Recently, language variety identification has enjoyed heightened popularity as an advanced task of language identification. The proposed model uses additional texts from a social network to improve language variety identification from two perspectives. First, they are used to introduce the effects of homophily. Secondly, they are used as expanded training data for shared layers of the proposed model. By introducing information from social networks, the model improved its accuracy by 1.67-5.56. Compared to state-of-the-art baselines, these improved performances are better in English and comparable in Spanish. Furthermore, we analyzed the cases of Portuguese and Arabic when the model showed weak performances, and found that the effect of homophily is likely to be weak due to sparsity and noises compared to languages with the strong performances.

2016

A Simple Scalable Neural Networks based Model for Geolocation Prediction in Twitter
Yasuhide Miura | Motoki Taniguchi | Tomoki Taniguchi | Tomoko Ohkuma
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

This paper describes a model that we submitted to W-NUT 2016 Shared task #1: Geolocation Prediction in Twitter. Our model classifies a tweet or a user to a city using a simple neural networks structure with fully-connected layers and average pooling processes. From the findings of previous geolocation prediction approaches, we integrated various user metadata along with message texts and trained the model with them. In the test run of the task, the model achieved the accuracy of 40.91% and the median distance error of 69.50 km in message-level prediction and the accuracy of 47.55% and the median distance error of 16.13 km in user-level prediction. These results are moderate performances in terms of accuracy and best performances in terms of distance. The results show a promising extension of neural networks based models for geolocation prediction where recent advances in neural networks can be added to enhance our current simple model.

Sentiment Analysis for Low Resource Languages: A Study on Informal Indonesian Tweets
Tuan Anh Le | David Moeljadi | Yasuhide Miura | Tomoko Ohkuma
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

This paper describes our attempt to build a sentiment analysis system for Indonesian tweets. With this system, we can study and identify sentiments and opinions in a text or document computationally. We used four thousand manually labeled tweets collected in February and March 2016 to build the model. Because of the variety of content in tweets, we analyze tweets into eight groups in total, including pos(itive), neg(ative), and neu(tral). Finally, we obtained 73.2% accuracy with Long Short Term Memory (LSTM) without normalizer.

MedNLPDoc: Japanese Shared Task for Clinical NLP
Eiji Aramaki | Yoshinobu Kano | Tomoko Ohkuma | Mizuki Morita
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

Due to the recent replacements of physical documents with electronic medical records (EMR), the importance of information processing in medical fields has been increased. We have been organizing the MedNLP task series in NTCIR-10 and 11. These workshops were the first shared tasks which attempt to evaluate technologies that retrieve important information from medical reports written in Japanese. In this report, we describe the NTCIR-12 MedNLPDoc task which is designed for more advanced and practical use for the medical fields. This task is considered as a multi-labeling task to a patient record. This report presents results of the shared task, discusses and illustrates remained issues in the medical natural language processing field.

2015

A Weighted Combination of Text and Image Classifiers for User Gender Inference
Tomoki Taniguchi | Shigeyuki Sakaki | Ryosuke Shigenaka | Yukihiro Tsuboshita | Tomoko Ohkuma
Proceedings of the Fourth Workshop on Vision and Language

2014

Twitter User Gender Inference Using Combined Analysis of Text and Image Processing
Shigeyuki Sakaki | Yasuhide Miura | Xiaojun Ma | Keigo Hattori | Tomoko Ohkuma
Proceedings of the Third Workshop on Vision and Language

TeamX: A Sentiment Analyzer with Enhanced Lexicon Mapping and Weighting Scheme for Unbalanced Data
Yasuhide Miura | Shigeyuki Sakaki | Keigo Hattori | Tomoko Ohkuma
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

Topic Modeling with Sentiment Clues and Relaxed Labeling Schema
Yasuhide Miura | Keigo Hattori | Tomoko Ohkuma | Hiroshi Masuichi
Proceedings of the 3rd Workshop on Sentiment Analysis where AI meets Psychology

Incorporating Knowledge Resources to Enhance Medical Information Extraction
Yasuhide Miura | Tomoko Ohkuma | Hiroshi Masuichi | Emiko Yamada Shinohara | Eiji Aramaki | Kazuhiko Ohe
The First Workshop on Natural Language Processing for Medical and Healthcare Fields

2010

Adverse-Effect Relations Extraction from Massive Clinical Records
Yasuhide Miura | Eiji Aramaki | Tomoko Ohkuma | Masatsugu Tonoike | Daigo Sugihara | Hiroshi Masuichi | Kazuhiko Ohe
Proceedings of the Second Workshop on NLP Challenges in the Information Explosion Era (NLPIX 2010)

2009

TEXT2TABLE: Medical Text Summarization System Based on Named Entity Recognition and Modality Identification
Eiji Aramaki | Yasuhide Miura | Masatsugu Tonoike | Tomoko Ohkuma | Hiroshi Mashuichi | Kazuhiko Ohe
Proceedings of the BioNLP 2009 Workshop

2004

Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation
Hiroshi Masuichi | Tomoko Ohkuma | Kiyoshi Ishikawa | Yasunari Harada | Kei Yoshimoto
Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation

2003

The Treatment of Japanese Focus Particles Based on Lexical-Functional Grammar
Tomoko Ohkuma | Hiroshi Masuichi | Hiroki Yoshimura | Yasunari Harada
Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation

Venues