Junting Pan


2024

pdf bib
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs
Zimu Lu | Aojun Zhou | Houxing Ren | Ke Wang | Weikang Shi | Junting Pan | Mingjie Zhan | Hongsheng Li
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models (LLMs) have exhibited great potential in mathematical reasoning. However, there remains a performance gap in this area between existing open-source models and closed-source models such as GPT-4. In this paper, we introduce MathGenie, a novel method for generating diverse and reliable math problems by leveraging the ground-truth solutions of the seed data. We augment these ground-truth solutions and use a specially finetuned model to translate these augmented solutions back into new questions. Subsequently, we generate code-integrated solutions for these questions. To ensure the correctness of the code-integrated solutions, we employ rationale-based verification for filtering. Then, we finetune various pretrained models, ranging from 7B to 70B, on the newly curated data, resulting in a family of models known as MathGenie. These models consistently outperform previous open-source models across five representative mathematical reasoning datasets, achieving state-of-the-art performance. In particular, MathGenie-InternLM2 achieves an accuracy of 87.7% on GSM8K and 55.7% on MATH, securing the best overall score.

pdf bib
GroundingGPT: Language Enhanced Multi-modal Grounding Model
Zhaowei Li | Qi Xu | Dong Zhang | Hang Song | YiQing Cai | Qi Qi | Ran Zhou | Junting Pan | Zefeng Li | Vu Tu | Zhida Huang | Tao Wang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multi-modal large language models (MLLMs) have demonstrated remarkable performance across various tasks. However, these models often prioritize capturing global information and overlook the importance of perceiving local information. This limitation hinders their ability to effectively understand fine-grained details and handle grounding tasks that necessitate nuanced comprehension. Although some recent works have made strides in this, they have primarily focused on single-modality inputs. Therefore, we propose GroundingGPT, an end-to-end language enhanced multi-modal grounding model. It is designed to perform fine-grained grounding tasks for three modalities: image, video and audio. To enhance the model’s performance, we adopt a coarse-to-fine training strategy, utilizing a three-stage training approach to progressively enhance the model’s semantic awareness and fine-grained understanding capabilities. Additionally, we employ a diversified stage-specific dataset construction pipeline, developing a multi-modal, multi-granularity dataset tailored for training the model in different stages. Extensive experiments conducted on multiple multi-modal benchmarks demonstrate that our model achieves impressive fine-grained understanding of multi-modal inputs on grounding tasks while maintaining or improving its global comprehension capabilities. Our code, model, and dataset are available at https://github.com/lzw-lzw/GroundingGPT.