Yan Ma
2024
Weak-to-Strong Reasoning
Yuqing Yang
|
Yan Ma
|
Pengfei Liu
Findings of the Association for Computational Linguistics: EMNLP 2024
When large language models (LLMs) surpass human capabilities, supervising them effectively becomes difficult. Weak-to-strong learning, where a less capable model enhances a stronger one, proves valuable in this context. Yet, the efficacy of this paradigm for complex reasoning tasks is still unexplored. In this paper, we introduce a progressive weak-to-strong reasoning framework that enables the strong model to autonomously refine its training data, maximizing the use of weak signals and unlocking its latent abilities. This framework begins with supervised fine-tuning on a selective small but high-quality dataset, followed by preference optimization on contrastive samples identified by the strong model itself. Experiments on the GSM8K and MATH datasets verify that our method can effectively improve the reasoning capabilities of Llama2-70b using three separate weak models. This work paves the way for a more scalable and sophisticated strategy to enhance AI reasoning powers. All relevant code and resources are available in https://github.com/GAIR-NLP/weak-to-strong-reasoning.
MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation
Yan Ma
|
Yu Qiao
|
Pengfei Liu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
A story premise succinctly defines a story’s main idea, foundation, and trajectory. It serves as the initial trigger in automatic story generation. Existing sources of story premises are limited by a lack of diversity, uneven quality, and high costs that make them difficult to scale. In response, we introduce Modular Story Premise Synthesis (MoPS) which breaks down story premises into modules like background and persona for automated design and generation. MoPS consists of three phases: (1) Pre-collect a consistent set of candidates for each module to form a nested dictionary. (2) Extract a key path from the nested dictionary as the premise design. (3) Instruct an LLM to integrate the design into a coherent premise sentence. Thorough evaluations demonstrate that our synthesized premises excel in diversity, fascination, completeness, and originality compared to those induced from large language models and captured from public story datasets. Similarly, the extended novels and scripts generated from our premises also exhibit higher quality. In supplementary materials, we provide the MoPS code suite, along with 7.5k generated premises and 1k extended stories.
Search