SOCCER: An Information-Sparse Discourse State Tracking Collection in the Sports Commentary Domain

Ruochen Zhang, Carsten Eickhoff


Abstract
In the pursuit of natural language understanding, there has been a long standing interest in tracking state changes throughout narratives. Impressive progress has been made in modeling the state of transaction-centric dialogues and procedural texts. However, this problem has been less intensively studied in the realm of general discourse where ground truth descriptions of states may be loosely defined and state changes are less densely distributed over utterances. This paper proposes to turn to simplified, fully observable systems that show some of these properties: Sports events. We curated 2,263 soccer matches including time-stamped natural language commentary accompanied by discrete events such as a team scoring goals, switching players or being penalized with cards. We propose a new task formulation where, given paragraphs of commentary of a game at different timestamps, the system is asked to recognize the occurrence of in-game events. This domain allows for rich descriptions of state while avoiding the complexities of many other real-world settings. As an initial point of performance measurement, we include two baseline methods from the perspectives of sentence classification with temporal dependence and current state-of-the-art generative model, respectively, and demonstrate that even sophisticated existing methods struggle on the state tracking task when the definition of state broadens or non-event chatter becomes prevalent.
Anthology ID:
2021.naacl-main.342
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4325–4333
Language:
URL:
https://aclanthology.org/2021.naacl-main.342
DOI:
10.18653/v1/2021.naacl-main.342
Bibkey:
Cite (ACL):
Ruochen Zhang and Carsten Eickhoff. 2021. SOCCER: An Information-Sparse Discourse State Tracking Collection in the Sports Commentary Domain. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4325–4333, Online. Association for Computational Linguistics.
Cite (Informal):
SOCCER: An Information-Sparse Discourse State Tracking Collection in the Sports Commentary Domain (Zhang & Eickhoff, NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.342.pdf
Video:
 https://aclanthology.org/2021.naacl-main.342.mp4
Data
MultiWOZOpen PI