Aligning LLMs for Thai Legal Question Answering with Efficient Semantic-Similarity Rewards

Pawitsapak Akarajaradwong; Chompakorn Chaksangchaichot; Pirat Pothavorn; Ekapol Chuangsuwanich; Attapol Rutherford; Sarana Nutanong

doi:10.18653/v1/2025.nllp-1.21

Aligning LLMs for Thai Legal Question Answering with Efficient Semantic-Similarity Rewards

Pawitsapak Akarajaradwong, Chompakorn Chaksangchaichot, Pirat Pothavorn, Ekapol Chuangsuwanich, Attapol Rutherford, Sarana Nutanong

Abstract

The Retrieval-Augmented Generation (RAG) systems’ performance on Thai legal question answering is still limited, especially for questions requiring extensive, complex legal reasoning. To address these limitations, we introduce a resource-efficient approach that aligns Large Language Models (LLMs) for improved citation accuracy and response quality using Group-Relative Policy Optimization (GRPO). Our proposed method leverages BGE-M3 embeddings as a cost-efficient semantic-similarity reward, significantly reducing computational expenses up to 2.5x compared to an LLM-based reward model. Experiments on the NitiBench benchmark demonstrate substantial improvements: GRPO achieves up to 90% citation-F1 gains relative to the base model and a 31% increase in joint quality metrics over instruction tuning. Crucially, our approach provides a practical and effective solution for enhancing legal LLMs in resource-constrained environments.

Anthology ID:: 2025.nllp-1.21
Volume:: Proceedings of the Natural Legal Language Processing Workshop 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venues:: NLLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 304–316
Language:
URL:: https://aclanthology.org/2025.nllp-1.21/
DOI:: 10.18653/v1/2025.nllp-1.21
Bibkey:
Cite (ACL):: Pawitsapak Akarajaradwong, Chompakorn Chaksangchaichot, Pirat Pothavorn, Ekapol Chuangsuwanich, Attapol Rutherford, and Sarana Nutanong. 2025. Aligning LLMs for Thai Legal Question Answering with Efficient Semantic-Similarity Rewards. In Proceedings of the Natural Legal Language Processing Workshop 2025, pages 304–316, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Aligning LLMs for Thai Legal Question Answering with Efficient Semantic-Similarity Rewards (Akarajaradwong et al., NLLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.nllp-1.21.pdf

PDF Cite Search Fix data