Neo Watanabe


2024

pdf bib
AIWolfDial 2024: Summary of Natural Language Division of 6th International AIWolf Contest
Yoshinobu Kano | Yuto Sahashi | Neo Watanabe | Kaito Kagaminuma | Claus Aranha | Daisuke Katagami | Kei Harada | Michimasa Inaba | Takeshi Ito | Hirotaka Osawa | Takashi Otsuki | Fujio Toriumi
Proceedings of the 2nd International AIWolfDial Workshop

We held our 6th annual AIWolf international contest to automatically play the Werewolf game “Mafia”, where players try finding liars via conversations, aiming at promoting developments in creating agents of more natural conversations in higher level, such as longer contexts, personal relationships, semantics, pragmatics, and logics, revealing the capabilities and limits of the generative AIs. In our Natural Language Division of the contest, we had eight Japanese speaking agent teams, and five English speaking agents, to mutually run games. By using the game logs, we performed human subjective evaluations, win rates, and detailed log analysis. We found that the entire system performance has largely improved over the previous year, due to the recent advantages of the LLMs. There are several new ideas to improve the way using LLMs such as the summarization, characterization, and the logics outside LLMs, etc. However, it is not perfect at all yet; the generated talks are sometimes inconsistent with the game actions. Our future work includes to reveal the capability of the LLMs, whether they can make the duality of the “liar”, in other words, holding a “true” and a “false” circumstances of the agent at the same time, even holding what these circumstances look like from other agents.

pdf bib
Werewolf Game Agent by Generative AI Incorporating Logical Information Between Players
Neo Watanabe | Yoshinobu Kano
Proceedings of the 2nd International AIWolfDial Workshop

In recent years, AI models based on GPT have advanced rapidly. These models are capable of generating text, translating between different languages, and answering questions with high accuracy. However, the process behind their outputs remains a black box, making it difficult to ascertain the data influencing their responses. These AI models do not always produce accurate outputs and are known for generating incorrect information, known as hallucinations, whose causes are hard to pinpoint. Moreover, they still face challenges in solving complex problems that require step-by-step reasoning, despite various improvements like the Chain-of-Thought approach. There’s no guarantee that these models can independently perform logical reasoning from scratch, raising doubts about the reliability and accuracy of their inferences. To address these concerns, this study proposes the incorporation of an explicit logical structure into the AI’s text generation process. As a validation experiment, a text-based agent capable of playing the Werewolf game, which requires deductive reasoning, was developed using GPT-4. By comparing the model combined with an external explicit logical structure and a baseline that lacks such a structure, the proposed method demonstrated superior reasoning capabilities in subjective evaluations, suggesting the effectiveness of adding an explicit logical framework to the conventional AI models.

2023

pdf bib
AIWolfDial 2023: Summary of Natural Language Division of 5th International AIWolf Contest
Yoshinobu Kano | Neo Watanabe | Kaito Kagaminuma | Claus Aranha | Jaewon Lee | Benedek Hauer | Hisaichi Shibata | Soichiro Miki | Yuta Nakamura | Takuya Okubo | Soga Shigemura | Rei Ito | Kazuki Takashima | Tomoki Fukuda | Masahiro Wakutani | Tomoya Hatanaka | Mami Uchida | Mikio Abe | Akihiro Mikami | Takashi Otsuki | Zhiyang Qi | Kei Harada | Michimasa Inaba | Daisuke Katagami | Hirotaka Osawa | Fujio Toriumi
Proceedings of the 16th International Natural Language Generation Conference: Generation Challenges

We held our 5th annual AIWolf international contest to automatically play the Werewolf game “Mafia”, where players try finding liars via conversations, aiming at promoting developments in creating agents of more natural conversations in higher level, such as longer contexts, personal relationships, semantics, pragmatics, and logics, revealing the capabilities and limits of the generative AIs. In our Natural Language Division of the contest, we had six Japanese speaking agents from five teams, and three English speaking agents, to mutually run games. By using the game logs, We performed human subjective evaluations and detailed log analysis. We found that the entire system performance has largely improved over the previous year, due to the recent advantages of the LLMs. However, it is not perfect at all yet; the generated talks are sometimes inconsistent with the game actions, it is still doubtful that the agents could infer roles by logics rather than superficial utterance generations. It is not explicitly observed in this log but it would be still difficult to make an agent telling a lie, pretend as a villager but it has an opposite goal inside. Our future work includes to reveal the capability of the LLMs, whether they can make the duality of the “liar”, in other words, holding a “true” and a “false” circumstances of the agent at the same time, even holding what these circumstances look like from other agents.