Shuo Yin


2024

pdf bib
MuMath: Multi-perspective Data Augmentation for Mathematical Reasoning in Large Language Models
Weihao You | Shuo Yin | Xudong Zhao | Zhilong Ji | Guoqiang Zhong | Jinfeng Bai
Findings of the Association for Computational Linguistics: NAACL 2024

Recently, the tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs. However, these models fall short in demonstrating the calculation process, which compromises user-friendliness and understanding of problem-solving steps. Conversely, while tool-free methods offer a clear display of the problem-solving process, their accuracy leaves room for improvement.These tool-free methods typically employ a somewhat narrow range of augmentation techniques such as rephrasing and difficulty enhancement to boost performance. In response to this issue, we have amalgamated and further refined these strengths while broadening the scope of augmentation methods to construct a **mu**lti-perspective augmentation dataset for **math**ematics—termed **MuMath** (𝜇-Math) Dataset.Subsequently, we finetune LLaMA-2 on the MuMath dataset to derive the MuMath model. Our experiments indicate that our MuMath-70B model achieves new state-of-the-art performance among tool-free methods—achieving 88.3% on GSM8K and 34.5% on MATH .We release the MuMath dataset along with its corresponding models and code for public use.

2023

pdf bib
ATFormer: A Learned Performance Model with Transfer Learning Across Devices for Deep Learning Tensor Programs
Yang Bai | Wenqian Zhao | Shuo Yin | Zixiao Wang | Bei Yu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

The training and inference efficiency of ever-larger deep neural networks highly rely on the performance of tensor operators on specific hardware platforms. Therefore, a compilation-based optimization flow with automatic tensor generation and parameter tuning is necessary for efficient model deployment. While compilation-based methods with performance models can provide dynamic and suitable code optimization, they suffer from a large design space exploration with rough measurement accuracy and poor transferability among different hardware platforms. This paper presents ATFormer, a simple yet efficient design with attention-inspired modules to accurately predict the performance of optimized operators by capturing global and long-range dependencies within a complete scheduling space. Compared with state-of-the-arts, ATFormer can predict the optimal implementation of tensor operators to reduce inference time with minimal effort on modern DNN benchmarks. Furthermore, ATFormer with pre-trained parameters can quickly adapt to different workloads and hardware via transfer learning.