Barch: an English Dataset of Bar Chart Summaries

Iza Škrjanec, Muhammad Salman Edhi, Vera Demberg


Abstract
We present Barch, a new English dataset of human-written summaries describing bar charts. This dataset contains 47 charts based on a selection of 18 topics. Each chart is associated with one of the four intended messages expressed in the chart title. Using crowdsourcing, we collected around 20 summaries per chart, or one thousand in total. The text of the summaries is aligned with the chart data as well as with analytical inferences about the data drawn by humans. Our datasets is one of the first to explore the effect of intended messages on the data descriptions in chart summaries. Additionally, it lends itself well to the task of training data-driven systems for chart-to-text generation. We provide results on the performance of state-of-the-art neural generation models trained on this dataset and discuss the strengths and shortcomings of different models.
Anthology ID:
2022.lrec-1.380
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3552–3560
Language:
URL:
https://aclanthology.org/2022.lrec-1.380
DOI:
Bibkey:
Cite (ACL):
Iza Škrjanec, Muhammad Salman Edhi, and Vera Demberg. 2022. Barch: an English Dataset of Bar Chart Summaries. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3552–3560, Marseille, France. European Language Resources Association.
Cite (Informal):
Barch: an English Dataset of Bar Chart Summaries (Škrjanec et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.380.pdf
Data
AutoChartChart2Text