Tianqiao Liu
2025
StatsChartMWP: A Dataset for Evaluating Multimodal Mathematical Reasoning Abilities on Math Word Problems with Statistical Charts
Dan Zhu
|
Tianqiao Liu
|
Zitao Liu
Findings of the Association for Computational Linguistics: EMNLP 2025
Recent advancements in Large Multimodal Models (LMMs) have showcased their impressive capabilities in mathematical reasoning tasks in visual contexts. As a step toward developing AI models to conduct rigorous multi-step multimodal reasoning, we introduce StatsChartMWP, a real-world educational dataset for evaluating visual mathematical reasoning abilities on math word problems (MWPs) with statistical charts. Our dataset contains 8,514 chart-based MWPs, meticulously curated by K-12 educators within real-world teaching scenarios. We provide detailed preprocessing steps and manual annotations to help evaluate state-of-the-art models on StatsChartMWP. Comparing baselines, we find that current models struggle in undertaking meticulous multi-step mathematical reasoning among technical languages, diagrams, tables, and equations. Towards alleviate this gap, we introduce CoTAR, a chain-of-thought (CoT) augmented reasoning solution that fine-tunes the LMMs with solution-oriented CoT-alike reasoning steps. The LMM trained with CoTAR is more effective than current open-source approaches. We conclude by shedding lights on challenges and opportunities in enhancement in LMMs and steer future research and development efforts in the realm of statistical chart comprehension and analysis. The code and data are available at https://github.com/ai4ed/StatsChartMWP.
2021
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations
Hang Li
|
Wenbiao Ding
|
Yu Kang
|
Tianqiao Liu
|
Zhongqin Wu
|
Zitao Liu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Existing audio-language task-specific predictive approaches focus on building complicated late-fusion mechanisms. However, these models are facing challenges of overfitting with limited labels and low model generalization abilities. In this paper, we present a Cross-modal Transformer for Audio-and-Language, i.e., CTAL, which aims to learn the intra-modality and inter-modality connections between audio and language through two proxy tasks on a large amount of audio-and-language pairs: masked language modeling and masked cross-modal acoustic modeling. After fine-tuning our pre-trained model on multiple downstream audio-and-language tasks, we observe significant improvements across various tasks, such as, emotion classification, sentiment analysis, and speaker verification. On this basis, we further propose a specially-designed fusion mechanism that can be used in fine-tuning phase, which allows our pre-trained model to achieve better performance. Lastly, we demonstrate detailed ablation studies to prove that both our novel cross-modality fusion component and audio-language pre-training methods significantly contribute to the promising results. The code and pre-trained models are available at https://github.com/tal-ai/CTAL_EMNLP2021.
Mathematical Word Problem Generation from Commonsense Knowledge Graph and Equations
Tianqiao Liu
|
Qiang Fang
|
Wenbiao Ding
|
Hang Li
|
Zhongqin Wu
|
Zitao Liu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
There is an increasing interest in the use of mathematical word problem (MWP) generation in educational assessment. Different from standard natural question generation, MWP generation needs to maintain the underlying mathematical operations between quantities and variables, while at the same time ensuring the relevance between the output and the given topic. To address above problem, we develop an end-to-end neural model to generate diverse MWPs in real-world scenarios from commonsense knowledge graph and equations. The proposed model (1) learns both representations from edge-enhanced Levi graphs of symbolic equations and commonsense knowledge; (2) automatically fuses equation and commonsense knowledge information via a self-planning module when generating the MWPs. Experiments on an educational gold-standard set and a large-scale generated MWP set show that our approach is superior on the MWP generation task, and it outperforms the SOTA models in terms of both automatic evaluation metrics, i.e., BLEU-4, ROUGE-L, Self-BLEU, and human evaluation metrics, i.e., equation relevance, topic relevance, and language coherence. To encourage reproducible results, we make our code and MWP dataset public available at https://github.com/tal-ai/MaKE_EMNLP2021.