DataTales: A Benchmark for Real-World Intelligent Data Narration

Yajing Yang, Qian Liu, Min-Yen Kan


Abstract
We introduce DataTales, a novel benchmark designed to assess the proficiency of language models in data narration, a task crucial for transforming complex tabular data into accessible narratives. Existing benchmarks often fall short in capturing the requisite analytical complexity for practical applications. DataTales addresses this gap by offering 4.9k financial reports paired with corresponding market data, showcasing the demand for models to create clear narratives and analyze large datasets while understanding specialized terminology in the field. Our findings highlights the significant challenge that language models face in achieving the necessary precision and analytical depth for proficient data narration, suggesting promising avenues for future model development and evaluation methodologies.
Anthology ID:
2024.emnlp-main.601
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10764–10788
Language:
URL:
https://aclanthology.org/2024.emnlp-main.601
DOI:
Bibkey:
Cite (ACL):
Yajing Yang, Qian Liu, and Min-Yen Kan. 2024. DataTales: A Benchmark for Real-World Intelligent Data Narration. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 10764–10788, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
DataTales: A Benchmark for Real-World Intelligent Data Narration (Yang et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.601.pdf