Md. Hamjajul Ashmafee
2025
Bengali ChartSumm: A Benchmark Dataset and study on feasibility of Large Language Models on Bengali Chart to Text Summarization
Nahida Akter Tanjila
|
Afrin Sultana Poushi
|
Sazid Abdullah Farhan
|
Abu Raihan Mostofa Kamal
|
Md. Azam Hossain
|
Md. Hamjajul Ashmafee
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)
In today’s data-driven world, effectively organizing and presenting data is challenging, particularly for non-experts. While tabular formats structure data, they often lack intuitive insights; charts, however, prefer accessible and impactful visual summaries. Although recent advancements in NLP, powered by large language models (LLMs), have primarily beneʐʒted high-resource languages like English, low-resource languages such as Bengali—spoken by millions globally—still face significant data limitations. This research addresses this gap by introducing “Bengali ChartSumm,” a benchmark dataset with 4,100 Bengali chart images, metadata, and summaries. This dataset facilitates the analysis of LLMs (mT5, BanglaT5, Gemma) in Bengali chart-to-text summarization, offering essential baselines and evaluations that enhance NLP research for low-resource languages.