LLMs Cannot (Yet) Match the Specificity and Simplicity of Online Communities in Long Form Question Answering

Kris-Fillip Kahl, Tolga Buz, Russa Biswas, Gerard De Melo


Abstract
Retail investing is on the rise, and a growing number of users is relying on online finance communities to educate themselves.However, recent years have positioned Large Language Models (LLMs) as powerful question answering (QA) tools, shifting users away from interacting in communities towards discourse with AI-driven conversational interfaces.These AI tools are currently limited by the availability of labelled data containing domain-specific financial knowledge.Therefore, in this work, we curate a QA preference dataset SocialFinanceQA for fine-tuning and aligning LLMs, extracted from more than 7.4 million submissions and 82 million comments from 2008 to 2022 in Reddit’s 15 largest finance communities. Additionally, we propose a novel framework called SocialQA-Eval as a generally-applicable method to evaluate generated QA responses.We evaluate various LLMs fine-tuned on this dataset, using traditional metrics, LLM-based evaluation, and human annotation. Our results demonstrate the value of high-quality Reddit data, with even state-of-the-art LLMs improving on producing simpler and more specific responses.
Anthology ID:
2024.findings-emnlp.111
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2028–2053
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.111
DOI:
10.18653/v1/2024.findings-emnlp.111
Bibkey:
Cite (ACL):
Kris-Fillip Kahl, Tolga Buz, Russa Biswas, and Gerard De Melo. 2024. LLMs Cannot (Yet) Match the Specificity and Simplicity of Online Communities in Long Form Question Answering. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 2028–2053, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
LLMs Cannot (Yet) Match the Specificity and Simplicity of Online Communities in Long Form Question Answering (Kahl et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.111.pdf
Software:
 2024.findings-emnlp.111.software.zip
Data:
 2024.findings-emnlp.111.data.zip