Kristiina Jokinen

Also published as: Päivi Kristiina Jokinen

2025

pdf bib abs
Integrating Conversational Entities and Dialogue Histories with Knowledge Graphs and Generative AI
Graham Wilcock | Kristiina Jokinen
Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology

Existing methods for storing dialogue history and for tracking mentioned entities in spoken dialogues usually handle these tasks separately. Recent advances in knowledge graphs and generative AI make it possible to integrate them in a framework with a uniform representation for dialogue management. This may help to build more natural and grounded dialogue models that can reduce misunderstanding and lead to more reliable dialogue-based interactions with AI agents. The paper describes ongoing work on this approach.

2024

pdf bib abs
The Need for Grounding in LLM-based Dialogue Systems
Kristiina Jokinen
Proceedings of the Workshop: Bridging Neurons and Symbols for Natural Language Processing and Knowledge Graphs Reasoning (NeusymBridge) @ LREC-COLING-2024

Grounding is a pertinent part of the design of LLM-based dialogue systems. Although research on grounding has a long tradition, the paradigm shift caused by LLMs has brought the concept onto the foreground, in particular in the context of cognitive robotics. To avoid generation of irrelevant or false information, the system needs to ground its utterances into real-world events, and to avoid the statistical parrot effect, the system needs to construct shared understanding of the dialogue context and of the partner’s intents. Grounding and construction of the shared context enables cooperation between the participants, and thus supports trustworthy interaction. This paper discusses grounding using neural LLM technology. It aims to bridge neural and symbolic computing on the cognitive architecture level, so as to contribute to a better understanding of how conversational reasoning and collaboration can be linked to LLM implementations to support trustworthy and flexible interaction.

pdf bib abs
Bridging Information Gaps in Dialogues with Grounded Exchanges Using Knowledge Graphs
Phillip Schneider | Nektarios Machner | Kristiina Jokinen | Florian Matthes
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Knowledge models are fundamental to dialogue systems for enabling conversational interactions, which require handling domain-specific knowledge. Ensuring effective communication in information-providing conversations entails aligning user understanding with the knowledge available to the system. However, dialogue systems often face challenges arising from semantic inconsistencies in how information is expressed in natural language compared to how it is represented within the system’s internal knowledge. To address this problem, we study the potential of large language models for conversational grounding, a mechanism to bridge information gaps by establishing shared knowledge between dialogue participants. Our approach involves annotating human conversations across five knowledge domains to create a new dialogue corpus called BridgeKG. Through a series of experiments on this dataset, we empirically evaluate the capabilities of large language models in classifying grounding acts and identifying grounded information items within a knowledge graph structure. Our findings offer insights into how these models use in-context learning for conversational grounding tasks and common prediction errors, which we illustrate with examples from challenging dialogues. We discuss how the models handle knowledge graphs as a semantic layer between unstructured dialogue utterances and structured information items.

2023

pdf bib
From Data to Dialogue: Leveraging the Structure of Knowledge Graphs for Conversational Exploratory Search
Phillip Schneider | Nils Rehtanz | Kristiina Jokinen | Florian Matthes
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

2022

pdf bib abs
Cognitive States and Types of Nods
Taiga Mori | Kristiina Jokinen | Yasuharu Den
Proceedings of the 2nd Workshop on People in Vision, Language, and the Mind

In this paper we will study how different types of nods are related to the cognitive states of the listener. The distinction is made between nods with movement starting upwards (up-nods) and nods with movement starting downwards (down-nods) as well as between single or repetitive nods. The data is from Japanese multiparty conversations, and the results accord with the previous findings indicating that up-nods are related to the change in the listener’s cognitive state after hearing the partner’s contribution, while down-nods convey the meaning that the listener’s cognitive state is not changed.

2020

pdf bib abs
The AICO Multimodal Corpus – Data Collection and Preliminary Analyses
Kristiina Jokinen
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper describes data collection and the first explorative research on the AICO Multimodal Corpus. The corpus contains eye-gaze, Kinect, and video recordings of human-robot and human-human interactions, and was collected to study cooperation, engagement and attention of human participants in task-based as well as in chatty type interactive situations. In particular, the goal was to enable comparison between human-human and human-robot interactions, besides studying multimodal behaviour and attention in the different dialogue activities. The robot partner was a humanoid Nao robot, and it was expected that its agent-like behaviour would render humanrobot interactions similar to human-human interaction but also high-light important differences due to the robot’s limited conversational capabilities. The paper reports on the preliminary studies on the corpus, concerning the participants’ eye-gaze and gesturing behaviours,which were chosen as objective measures to study differences in their multimodal behaviour patterns with a human and a robot partner.

pdf bib abs
Analysis of Body Behaviours in Human-Human and Human-Robot Interactions
Taiga Mori | Kristiina Jokinen | Yasuharu Den
Proceedings of LREC2020 Workshop "People in language, vision and the mind" (ONION2020)

We conducted preliminary comparison of human-robot (HR) interaction with human-human (HH) interaction conducted in English and in Japanese. As the result, body gestures increased in HR, while hand and head gestures decreased in HR. Concerning hand gesture, they were composed of more diverse and complex forms, trajectories and functions in HH than in HR. Moreover, English speakers produced 6 times more hand gestures than Japanese speakers in HH. Regarding head gesture, even though there was no difference in the frequency of head gestures between English speakers and Japanese speakers in HH, Japanese speakers produced slightly more nodding during the robot’s speaking than English speakers in HR. Furthermore, positions of nod were different depending on the language. Concerning body gesture, participants produced body gestures mostly to regulate appropriate distance with the robot in HR. Additionally, English speakers produced slightly more body gestures than Japanese speakers.

We demonstrate a bilingual robot application, WikiTalk, that can talk fluently in both English and Japanese about almost any topic using information from English and Japanese Wikipedias. The English version of the system has been demonstrated previously, but we now present a live demo with a Nao robot that speaks English and Japanese and switches language on request. The robot supports the verbal interaction with face-tracking, nodding and communicative gesturing. One of the key features of the WikiTalk system is that the robot can switch from the current topic to related topics during the interaction in order to navigate around Wikipedia following the user’s individual interests.

pdf bib abs
Double Topic Shifts in Open Domain Conversations: Natural Language Interface for a Wikipedia-based Robot Application
Kristiina Jokinen | Graham Wilcock
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

The paper describes topic shifting in dialogues with a robot that provides information from Wiki-pedia. The work focuses on a double topical construction of dialogue coherence which refers to discourse coherence on two levels: the evolution of dialogue topics via the interaction between the user and the robot system, and the creation of discourse topics via the content of the Wiki-pedia article itself. The user selects topics that are of interest to her, and the system builds a list of potential topics, anticipated to be the next topic, by the links in the article and by the keywords extracted from the article. The described system deals with Wikipedia articles, but could easily be adapted to other digital information providing systems.

2015

pdf bib
Sentiment analysis on conversational texts
Birgitta Ojamaa | Päivi Kristiina Jokinen | Kadri Muischenk
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Multilingual WikiTalk: Wikipedia-based talking robots that switch languages.
Graham Wilcock | Kristiina Jokinen
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib abs
Open-domain Interaction and Online Content in the Sami Language
Kristiina Jokinen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents data collection and collaborative community events organised within the project Digital Natives on the North Sami language. The project is one of the collaboration initiatives on endangered Finno-Ugric languages, supported by the larger framework between the Academy of Finland and the Hungarian Academy of Sciences. The goal of the project is to improve digital visibility and viability of the targeted Finno-Ugric languages, as well as to develop language technology tools and resources in order to assist automatic language processing and experimenting with multilingual interactive applications.

pdf bib
Towards automatic annotation of communicative gesturing
Kristiina Jokinen | Graham Wilcock
Proceedings of the Third Workshop on Vision and Language

2013

pdf bib
Open-Domain Information Access with Talking Robots
Kristiina Jokinen | Graham Wilcock
Proceedings of the SIGDIAL 2013 Conference

2012

pdf bib
Explorations in the Speakers’ Interaction Experience and Self-Assessments
Kristiina Jokinen
Proceedings of COLING 2012: Posters

pdf bib
Multimodal Signals and Holistic Interaction Structuring
Kristiina Jokinen | Graham Wilcock
Proceedings of COLING 2012: Posters

pdf bib abs
Investigating Engagement - intercultural and technological aspects of the collection, analysis, and use of the Estonian Multiparty Conversational video data
Kristiina Jokinen | Silvi Tenjes
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we describe the goals of the Estonian corpus collection and analysis activities, and introduce the recent collection of Estonian First Encounters data. The MINT project aims at deepening our understanding of the conversational properties and practices in human interactions. We especially investigate conversational engagement and cooperation, and discuss some observations on the participants' views concerning the interaction they have been engaged.

pdf bib abs
Constructive Interaction for Talking about Interesting Topics
Kristiina Jokinen | Graham Wilcock
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper discusses mechanisms for topic management in conversations, concentrating on interactions where the interlocutors react to each other's presentation of new information and construct a shared context in which to exchange information about interesting topics. This is illustrated with a robot simulator that can talk about unrestricted (open-domain) topics that the human interlocutor shows interest in. Wikipedia is used as the source of information from which the robotic agent draws its world knowledge.

pdf bib abs
Multimodal Corpus of Multi-party Conversations in Second Language
Shota Yamasaki | Hirohisa Furukawa | Masafumi Nishida | Kristiina Jokinen | Seiichi Yamamoto
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We developed a dialogue-based tutoring system for teaching English to Japanese students and plan to transfer the current software tutoring agent into an embodied robot in the hope that the robot will enrich conversation by allowing more natural interactions in small group learning situations. To enable smooth communication between an intelligent agent and the user, the agent must have realistic models on when to take turns, when to interrupt, and how to catch the partner's attention. For developing the realistic models applicable for computer assisted language learning systems, we also need to consider the differences between the mother tongue and second language that affect communication style. We collected a multimodal corpus of multi-party conversations in English as the second language to investigate the differences in communication styles. We describe our multimodal corpus and explore features of communication style e.g. filled pauses, and non-verbal information, such as eye-gaze, which show different characteristics between the mother tongue and second language.

pdf bib abs
Feedback in Nordic First-Encounters: a Comparative Study
Costanza Navarretta | Elisabeth Ahlsén | Jens Allwood | Kristiina Jokinen | Patrizia Paggio
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper compares how feedback is expressed via speech and head movements in comparable corpora of first encounters in three Nordic languages: Danish, Finnish and Swedish. The three corpora have been collected following common guidelines, and they have been annotated according to the same scheme in the NOMCO project. The results of the comparison show that in this data the most frequent feedback-related head movement is Nod in all three languages. Two types of Nods were distinguished in all corpora: Down-nods and Up-nods; the participants from the three countries use Down- and Up-nods with different frequency. In particular, Danes use Down-nods more frequently than Finns and Swedes, while Swedes use Up-nods more frequently than Finns and Danes. Finally, Finns use more often single Nods than repeated Nods, differing from the Swedish and Danish participants. The differences in the frequency of both Down-nods and Up-Nods in the Danish, Finnish and Swedish interactions are interesting given that Nordic countries are not only geographically near, but are also considered to be very similar culturally. Finally, a comparison of feedback-related words in the Danish and Swedish corpora shows that Swedes and Danes use common feedback words corresponding to yes and no with similar frequency.

2011

pdf bib
Creating Comparable Multimodal Corpora for Nordic Languages
Costanza Navarretta | Elisabeth Ahlsén | Jens Allwood | Kristiina Jokinen | Patrizia Paggio
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

2010

pdf bib abs
The NOMCO Multimodal Nordic Resource - Goals and Characteristics
Patrizia Paggio | Jens Allwood | Elisabeth Ahlsén | Kristiina Jokinen | Costanza Navarretta
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the multimodal corpora that are being collected and annotated in the Nordic NOMCO project. The corpora will be used to study communicative phenomena such as feedback, turn management and sequencing. They already include video material for Swedish, Danish, Finnish and Estonian, and several social activities are represented. The data will make it possible to verify empirically how gestures (head movements, facial displays, hand gestures and body postures) and speech interact in all the three mentioned aspects of communication. The data are being annotated following the MUMIN annotation scheme, which provides attributes concerning the shape and the communicative functions of head movements, face expressions, body posture and hand gestures. After having described the corpora, the paper discusses how they will be used to study the way feedback is expressed in speech and gestures, and reports results from two pilot studies where we investigated the function of head gestures ― both single and repeated ― in combination with feedback expressions. The annotated corpora will be valuable sources for research on intercultural communication as well as for interaction in the individual languages.

pdf bib abs
Non-verbal Signals for Turn-taking and Feedback
Kristiina Jokinen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper concerns non-verbal communication, and describes especially the use of eye-gaze to signal turn-taking and feedback in conversational settings. Eye-gaze supports smooth interaction by providing signals that the interlocutors interpret with respect to such conversational functions as taking turns and giving feedback. New possibilities to study the effect of eye-gaze on the interlocutors communicative behaviour have appeared with the eye-tracking technology which in the past years has matured to the level where its use to study naturally occurring dialogues have become easier and more reliable to conduct. It enables the tracking of eye-fixations and gaze-paths, and thus allows analysis of the persons turn-taking and feedback behaviour through the analysis of their focus of attention. In this paper, experiments on the interlocutors non-verbal communication in conversational settings using the eye-tracker are reported, and results of classifying turn-taking using eye-gaze and gesture information are presented. Also the hybrid method that combines signal level analysis with human interpretation is discussed.