Getting MoRE out of Mixture of Language Model Reasoning Experts

Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, Jordan Boyd-Graber


Abstract
While recent large language models (LLMs) improve on various question answering (QA) datasets, it remains difficult for a single model to generalize across question types that require distinct reasoning abilities. We provide empirical evidence that state-of-the-art LLMs suffer from poor generalizability on reasoning types beyond those seen in the prompt. To remedy this, we propose a Mixture-of-Reasoning-Experts (MORE) framework that ensembles diverse specialized language models. We specialize the backbone language model with prompts optimized for different reasoning categories, including factual, multihop, mathematical, and commonsense reasoning. Our key insight is to leverage agreement among the specialized experts to select the best answer for each question, or to abstain from answering. This gives MORE higher accuracy than any single specialized model on a collection of 12 QA datasets from four reasoning types. Beyond generalizability, the interpretable design of MORE improves selective question answering results compared to baselines without incorporating inter-expert agreement. This framework is also more interpretable and useful to human consumers of QA outputs. Our human study confirms that presenting expert predictions and the answer selection process helps annotators more accurately calibrate when to trust the system’s output. We release all code and data to facilitate future work.
Anthology ID:
2023.findings-emnlp.552
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8234–8249
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.552
DOI:
10.18653/v1/2023.findings-emnlp.552
Bibkey:
Cite (ACL):
Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, and Jordan Boyd-Graber. 2023. Getting MoRE out of Mixture of Language Model Reasoning Experts. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8234–8249, Singapore. Association for Computational Linguistics.
Cite (Informal):
Getting MoRE out of Mixture of Language Model Reasoning Experts (Si et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.552.pdf