Takashi Kodama


2023

pdf bib
KWJA: A Unified Japanese Analyzer Based on Foundation Models
Nobuhiro Ueda | Kazumasa Omura | Takashi Kodama | Hirokazu Kiyomaru | Yugo Murawaki | Daisuke Kawahara | Sadao Kurohashi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

We present KWJA, a high-performance unified Japanese text analyzer based on foundation models.KWJA supports a wide range of tasks, including typo correction, word segmentation, word normalization, morphological analysis, named entity recognition, linguistic feature tagging, dependency parsing, PAS analysis, bridging reference resolution, coreference resolution, and discourse relation analysis, making it the most versatile among existing Japanese text analyzers.KWJA solves these tasks in a multi-task manner but still achieves competitive or better performance compared to existing analyzers specialized for each task.KWJA is publicly available under the MIT license at https://github.com/ku-nlp/kwja.

pdf bib
Is a Knowledge-based Response Engaging?: An Analysis on Knowledge-Grounded Dialogue with Information Source Annotation
Takashi Kodama | Hirokazu Kiyomaru | Yin Jou Huang | Taro Okahisa | Sadao Kurohashi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Currently, most knowledge-grounded dialogue response generation models focus on reflecting given external knowledge. However, even when conveying external knowledge, humans integrate their own knowledge, experiences, and opinions with external knowledge to make their utterances engaging. In this study, we analyze such human behavior by annotating the utterances in an existing knowledge-grounded dialogue corpus. Each entity in the corpus is annotated with its information source, either derived from external knowledge (database-derived) or the speaker’s own knowledge, experiences, and opinions (speaker-derived). Our analysis shows that the presence of speaker-derived information in the utterance improves dialogue engagingness. We also confirm that responses generated by an existing model, which is trained to reflect the given knowledge, cannot include speaker-derived information in responses as often as humans do.

2022

pdf bib
Constructing a Culinary Interview Dialogue Corpus with Video Conferencing Tool
Taro Okahisa | Ribeka Tanaka | Takashi Kodama | Yin Jou Huang | Sadao Kurohashi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Interview is an efficient way to elicit knowledge from experts of different domains. In this paper, we introduce CIDC, an interview dialogue corpus in the culinary domain in which interviewers play an active role to elicit culinary knowledge from the cooking expert. The corpus consists of 308 interview dialogues (each about 13 minutes in length), which add up to a total of 69,000 utterances. We use a video conferencing tool for data collection, which allows us to obtain the facial expressions of the interlocutors as well as the screen-sharing contents. To understand the impact of the interlocutors’ skill level, we divide the experts into “semi-professionals’” and “enthusiasts” and the interviewers into “skilled interviewers” and “unskilled interviewers.” For quantitative analysis, we report the statistics and the results of the post-interview questionnaire. We also conduct qualitative analysis on the collected interview dialogues and summarize the salient patterns of how interviewers elicit knowledge from the experts. The corpus serves the purpose to facilitate future research on the knowledge elicitation mechanism in interview dialogues.

pdf bib
Explicit Use of Topicality in Dialogue Response Generation
Takumi Yoshikoshi | Hayato Atarashi | Takashi Kodama | Sadao Kurohashi
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop

The current chat dialogue systems implicitly consider the topic given the context, but not explicitly. As a result, these systems often generate inconsistent responses with the topic of the moment. In this study, we propose a dialogue system that responds appropriately following the topic by selecting the entity with the highest “topicality.” In topicality estimation, the model is trained through self-supervised learning that regards entities that appear in both context and response as the topic entities. In response generation, the model is trained to generate topic-relevant responses based on the estimated topicality. Experimental results show that our proposed system can follow the topic more than the existing dialogue system that considers only the context.

pdf bib
Construction of Hierarchical Structured Knowledge-based Recommendation Dialogue Dataset and Dialogue System
Takashi Kodama | Ribeka Tanaka | Sadao Kurohashi
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

We work on a recommendation dialogue system to help a user understand the appealing points of some target (e.g., a movie). In such dialogues, the recommendation system needs to utilize structured external knowledge to make informative and detailed recommendations. However, there is no dialogue dataset with structured external knowledge designed to make detailed recommendations for the target. Therefore, we construct a dialogue dataset, Japanese Movie Recommendation Dialogue (JMRD), in which the recommender recommends one movie in a long dialogue (23 turns on average). The external knowledge used in this dataset is hierarchically structured, including title, casts, reviews, and plots. Every recommender’s utterance is associated with the external knowledge related to the utterance. We then create a movie recommendation dialogue system that considers the structure of the external knowledge and the history of the knowledge used. Experimental results show that the proposed model is superior in knowledge selection to the baseline models.

2020

pdf bib
A System for Worldwide COVID-19 Information Aggregation
Akiko Aizawa | Frederic Bergeron | Junjie Chen | Fei Cheng | Katsuhiko Hayashi | Kentaro Inui | Hiroyoshi Ito | Daisuke Kawahara | Masaru Kitsuregawa | Hirokazu Kiyomaru | Masaki Kobayashi | Takashi Kodama | Sadao Kurohashi | Qianying Liu | Masaki Matsubara | Yusuke Miyao | Atsuyuki Morishima | Yugo Murawaki | Kazumasa Omura | Haiyue Song | Eiichiro Sumita | Shinji Suzuki | Ribeka Tanaka | Yu Tanaka | Masashi Toyoda | Nobuhiro Ueda | Honai Ueoka | Masao Utiyama | Ying Zhong
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories.

pdf bib
Generating Responses that Reflect Meta Information in User-Generated Question Answer Pairs
Takashi Kodama | Ryuichiro Higashinaka | Koh Mitsuda | Ryo Masumura | Yushi Aono | Ryuta Nakamura | Noritake Adachi | Hidetoshi Kawabata
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper concerns the problem of realizing consistent personalities in neural conversational modeling by using user generated question-answer pairs as training data. Using the framework of role play-based question answering, we collected single-turn question-answer pairs for particular characters from online users. Meta information was also collected such as emotion and intimacy related to question-answer pairs. We verified the quality of the collected data and, by subjective evaluation, we also verified their usefulness in training neural conversational models for generating utterances reflecting the meta information, especially emotion.