Yusuke Mukuta


2024

pdf bib
Content-Specific Humorous Image Captioning Using Incongruity Resolution Chain-of-Thought
Kohtaro Tanaka | Kohei Uehara | Lin Gu | Yusuke Mukuta | Tatsuya Harada
Findings of the Association for Computational Linguistics: NAACL 2024

Although automated image captioning methods have benefited considerably from the development of large language models (LLMs), generating humorous captions is still a challenging task. Humorous captions generated by humans are unique to the image and reflect the content of the image. However, captions generated using previous captioning models tend to be generic. Therefore, we propose incongruity-resolution chain-of-thought (IRCoT) as a novel prompting framework that creates content-specific resolutions from fine details extracted from an image. Furthermore, we integrate logit bias and negative sampling to suppress the output of generic resolutions. The results of experiments with GPT4-V demonstrate that our proposed framework effectively generated humorous captions tailored to the content of specific input images.

2022

pdf bib
Learning to Evaluate Humor in Memes Based on the Incongruity Theory
Kohtaro Tanaka | Hiroaki Yamane | Yusuke Mori | Yusuke Mukuta | Tatsuya Harada
Proceedings of the Second Workshop on When Creative AI Meets Conversational AI

Memes are a widely used means of communication on social media platforms, and are known for their ability to “go viral”. In prior works, researchers have aimed to develop an AI system to understand humor in memes. However, existing methods are limited by the reliability and consistency of the annotations in the dataset used to train the underlying models. Moreover, they do not explicitly take advantage of the incongruity between images and their captions, which is known to be an important element of humor in memes. In this study, we first gathered real-valued humor annotations of 7,500 memes through a crowdwork platform. Based on this data, we propose a refinement process to extract memes that are not influenced by interpersonal differences in the perception of humor and a method designed to extract and utilize incongruities between images and captions. The results of an experimental comparison with models using vision and language pretraining models show that our proposed approach outperformed other models in a binary classification task of evaluating whether a given meme was humorous.

2020

pdf bib
Finding and Generating a Missing Part for Story Completion
Yusuke Mori | Hiroaki Yamane | Yusuke Mukuta | Tatsuya Harada
Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Creating a story is difficult. Professional writers often experience a writer’s block. Thus, providing automatic support to writers is crucial but also challenging. Recently, in the field of generating and understanding stories, story completion (SC) has been proposed as a method for generating missing parts of an incomplete story. Despite this method’s usefulness in providing creative support, its applicability is currently limited because it requires the user to have prior knowledge of the missing part of a story. Writers do not always know which part of their writing is flawed. To overcome this problem, we propose a novel approach called “missing position prediction (MPP).” Given an incomplete story, we aim to predict the position of the missing part. We also propose a novel method for MPP and SC. We first conduct an experiment focusing on MPP, and our analysis shows that highly accurate predictions can be obtained when the missing part of a story is the beginning or the end. This suggests that if a story has a specific beginning or end, they play significant roles. We conduct an experiment on SC using MPP, and our proposed method demonstrates promising results.

2019

pdf bib
Toward a Better Story End: Collecting Human Evaluation with Reasons
Yusuke Mori | Hiroaki Yamane | Yusuke Mukuta | Tatsuya Harada
Proceedings of the 12th International Conference on Natural Language Generation

Creativity is an essential element of human nature used for many activities, such as telling a story. Based on human creativity, researchers have attempted to teach a computer to generate stories automatically or support this creative process. In this study, we undertake the task of story ending generation. This is a relatively new task, in which the last sentence of a given incomplete story is automatically generated. This is challenging because, in order to predict an appropriate ending, the generation method should comprehend the context of events. Despite the importance of this task, no clear evaluation metric has been established thus far; hence, it has remained an open problem. Therefore, we study the various elements involved in evaluating an automatic method for generating story endings. First, we introduce a baseline hierarchical sequence-to-sequence method for story ending generation. Then, we conduct a pairwise comparison against human-written endings, in which annotators choose the preferable ending. In addition to a quantitative evaluation, we conduct a qualitative evaluation by asking annotators to specify the reason for their choice. From the collected reasons, we discuss what elements the evaluation should focus on, to thereby propose effective metrics for the task.