Hanfei Sun


2022

pdf bib
SPORTSINTERVIEW: A Large-Scale Sports Interview Benchmark for Entity-centric Dialogues
Hanfei Sun | Ziyuan Cao | Diyi Yang
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We propose a novel knowledge grounded dialogue (interview) dataset SPORTSINTERVIEW set in the domain of sports interview. Our dataset contains two types of external knowledge sources as knowledge grounding, and is rich in content, containing about 150K interview sessions and 34K distinct interviewees. Compared to existing knowledge grounded dialogue datasets, our interview dataset is larger in size, comprises natural dialogues revolving around real-world sports matches, and have more than one dimension of external knowledge linking. We performed several experiments on SPORTSINTERVIEW and found that models such as BART fine-tuned on our dataset are able to learn lots of relevant domain knowledge and generate meaningful sentences (questions or responses). However, their performance is still far from humans (by comparing to gold sentences in the dataset) and hence encourages future research utilizing SPORTSINTERVIEW.
Search
Co-authors
Venues