ComFact: A Benchmark for Linking Contextual Commonsense Knowledge
Silin Gao | Jena D. Hwang | Saya Kanno | Hiromi Wakaki | Yuki Mitsufuji | Antoine Bosselut
Findings of the Association for Computational Linguistics: EMNLP 2022
Understanding rich narratives, such as dialogues and stories, often requires natural language processing systems to access relevant knowledge from commonsense knowledge graphs. However, these systems typically retrieve facts from KGs using simple heuristics that disregard the complex challenges of identifying situationally-relevant commonsense knowledge (e.g., contextualization, implicitness, ambiguity).In this work, we propose the new task of commonsense fact linking, where models are given contexts and trained to identify situationally-relevant commonsense knowledge from KGs. Our novel benchmark, ComFact, contains ~293k in-context relevance annotations for commonsense triplets across four stylistically diverse dialogue and storytelling datasets. Experimental results confirm that heuristic fact linking approaches are imprecise knowledge extractors. Learned fact linking models demonstrate across-the-board performance improvements (~34.6% F1) over these heuristics. Furthermore, improved knowledge retrieval yielded average downstream improvements of 9.8% for a dialogue response generation task. However, fact linking models still significantly underperform humans, suggesting our benchmark is a promising testbed for research in commonsense augmentation of NLP systems.
Fundamental Exploration of Evaluation Metrics for Persona Characteristics of Text Utterances
Chiaki Miyazaki | Saya Kanno | Makoto Yoda | Junya Ono | Hiromi Wakaki
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
To maintain utterance quality of a persona-aware dialog system, inappropriate utterances for the persona should be thoroughly filtered. When evaluating the appropriateness of a large number of arbitrary utterances to be registered in the utterance database of a retrieval-based dialog system, evaluation metrics that require a reference (or a “correct” utterance) for each evaluation target cannot be used. In addition, practical utterance filtering requires the ability to select utterances based on the intensity of persona characteristics. Therefore, we are developing metrics that can be used to capture the intensity of persona characteristics and can be computed without references tailored to the evaluation targets. To this end, we explore existing metrics and propose two new metrics: persona speaker probability and persona term salience. Experimental results show that our proposed metrics show weak to moderate correlations between scores of persona characteristics based on human judgments and outperform other metrics overall in filtering inappropriate utterances for particular personas.
- Hiromi Wakaki 2
- Silin Gao 1
- Jena D. Hwang 1
- Yuki Mitsufuji 1
- Antoine Bosselut 1
- show all...