The 2nd International AIWolfDial Workshop (2024)


up

pdf (full)
bib (full)
Proceedings of the 2nd International AIWolfDial Workshop

pdf bib
Proceedings of the 2nd International AIWolfDial Workshop
Yoshinobu Kano

pdf bib
AIWolfDial 2024: Summary of Natural Language Division of 6th International AIWolf Contest
Yoshinobu Kano | Yuto Sahashi | Neo Watanabe | Kaito Kagaminuma | Claus Aranha | Daisuke Katagami | Kei Harada | Michimasa Inaba | Takeshi Ito | Hirotaka Osawa | Takashi Otsuki | Fujio Toriumi

We held our 6th annual AIWolf international contest to automatically play the Werewolf game “Mafia”, where players try finding liars via conversations, aiming at promoting developments in creating agents of more natural conversations in higher level, such as longer contexts, personal relationships, semantics, pragmatics, and logics, revealing the capabilities and limits of the generative AIs. In our Natural Language Division of the contest, we had eight Japanese speaking agent teams, and five English speaking agents, to mutually run games. By using the game logs, we performed human subjective evaluations, win rates, and detailed log analysis. We found that the entire system performance has largely improved over the previous year, due to the recent advantages of the LLMs. There are several new ideas to improve the way using LLMs such as the summarization, characterization, and the logics outside LLMs, etc. However, it is not perfect at all yet; the generated talks are sometimes inconsistent with the game actions. Our future work includes to reveal the capability of the LLMs, whether they can make the duality of the “liar”, in other words, holding a “true” and a “false” circumstances of the agent at the same time, even holding what these circumstances look like from other agents.

pdf bib
Text Generation Indistinguishable from Target Person by Prompting Few Examples Using LLM
Yuka Tsubota | Yoshinobu Kano

To achieve smooth and natural communication between a dialogue system and a human, it is necessary for the dialogue system to behave more human-like. Recreating the personality of an actual person can be an effective way for this purpose. This study proposes a method to recreate a personality by a large language model (generative AI) without training, but with prompt technique to make the creation cost as low as possible. Collecting a large amount of dialogue data from a specific person is not easy and requires a significant amount of time for training. Therefore, we aim to recreate the personality of a specific individual without using dialogue data. The personality referred to in this paper denotes the image of a person that can be determined solely from the input and output of text dialogues. As a result of the experiments, it was revealed that by using prompts combining profile information, responses to few questions, and extracted speaking characteristics from those responses, it is possible to improve the reproducibility of a specific individual’s personality.

pdf bib
Werewolf Game Agent by Generative AI Incorporating Logical Information Between Players
Neo Watanabe | Yoshinobu Kano

In recent years, AI models based on GPT have advanced rapidly. These models are capable of generating text, translating between different languages, and answering questions with high accuracy. However, the process behind their outputs remains a black box, making it difficult to ascertain the data influencing their responses. These AI models do not always produce accurate outputs and are known for generating incorrect information, known as hallucinations, whose causes are hard to pinpoint. Moreover, they still face challenges in solving complex problems that require step-by-step reasoning, despite various improvements like the Chain-of-Thought approach. There’s no guarantee that these models can independently perform logical reasoning from scratch, raising doubts about the reliability and accuracy of their inferences. To address these concerns, this study proposes the incorporation of an explicit logical structure into the AI’s text generation process. As a validation experiment, a text-based agent capable of playing the Werewolf game, which requires deductive reasoning, was developed using GPT-4. By comparing the model combined with an external explicit logical structure and a baseline that lacks such a structure, the proposed method demonstrated superior reasoning capabilities in subjective evaluations, suggesting the effectiveness of adding an explicit logical framework to the conventional AI models.

pdf bib
Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies
Zhiyang Qi | Michimasa Inaba

Recent advancements in natural language processing, particularly with large language models (LLMs) like GPT-4, have significantly enhanced dialogue systems, enabling them to generate more natural and fluent conversations. Despite these improvements, challenges persist, such as managing continuous dialogues, memory retention, and minimizing hallucinations. The AIWolfDial2024 addresses these challenges by employing the Werewolf Game, an incomplete information game, to test the capabilities of LLMs in complex interactive environments. This paper introduces a LLM-based Werewolf Game AI, where each role is supported by situation analysis to aid response generation. Additionally, for the werewolf role, various persuasion strategies, including logical appeal, credibility appeal, and emotional appeal, are employed to effectively persuade other players to align with its actions.

pdf bib
Verification of Reasoning Ability using BDI Logic and Large Language Model in AIWolf
Hiraku Gondo | Hiroki Sakaji | Itsuki Noda

We attempt to improve the reasoning capability of LLMs in werewolf game by combining BDI logic with LLMs. While LLMs such as ChatGPT has been developed and used for various tasks, there remain several weakness of the LLMs. Logical reasoning is one of such weakness. Therefore, we try to introduce BDI logic-based prompts to verify the logical reasoning ability of LLMs in dialogue of werewofl game. Experiments and evaluations were conducted using “AI-Werewolf,” a communication game for AI with incomplete information. From the results of the game played by five agents, we compare the logical reasoning ability of LLMs by using the win rate and the vote rate against werewolf.

pdf bib
Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information
Yoshiki Tanaka | Takumasa Kaneko | Hiroki Onozeki | Natsumi Ezure | Ryuichi Uehara | Zhiyang Qi | Tomoya Higuchi | Ryutaro Asahara | Michimasa Inaba

The Werewolf Game is a communication game where players’ reasoning and discussion skills are essential. In this study, we present a Werewolf AI agent developed for the AIWolfDial 2024 shared task, co-hosted with the 17th INLG. In recent years, large language models like ChatGPT have garnered attention for their exceptional response generation and reasoning capabilities. We thus develop the LLM-based agents for the Werewolf Game. This study aims to enhance the consistency of the agent’s utterances by utilizing dialogue summaries generated by LLMs and manually designed personas and utterance examples. By analyzing self-match game logs, we demonstrate that the agent’s utterances are contextually consistent and that the character, including tone, is maintained throughout the game.

pdf bib
An Implementation of Werewolf Agent That does not Truly Trust LLMs
Takehiro Sato | Shintaro Ozaki | Daisaku Yokoyama

Werewolf is an incomplete information game, which has several challenges when creating a computer agent as a player given the lack of understanding of the situation and individuality of utterance (e.g., computer agents are not capable of characterful utterance or situational lying). We propose a werewolf agent that solves some of those difficulties by combining a Large Language Model (LLM) and a rule-based algorithm. In particular, our agent uses a rule-based algorithm to select an output either from an LLM or a template prepared beforehand based on the results of analyzing conversation history using an LLM. It allows the agent to refute in specific situations, identify when to end the conversation, and behave with persona. This approach mitigated conversational inconsistencies and facilitated logical utterance as a result. We also conducted a qualitative evaluation, which resulted in our agent being perceived as more human-like compared to an unmodified LLM. The agent is freely available for contributing to advance the research in the field of Werewolf game.