Shijia Jiang
2025
IntelliChain Stars at the Regulations Challenge Task: A Large Language Model for Financial Regulation
Shijia Jiang
|
Yongfu Dai
|
Haochen Jia
|
Yuxin Wang
|
Hao Wang
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
We present our approach to the COLING-2025 Regulations Challenge, which evaluates large language models (LLMs) on nine regulatory tasks, such as abbreviation recognition and financial data extraction. To address challenges like domain-specific terminologies and dynamic regulatory contexts, we developed a robust data construction pipeline, integrating proprietary Chinese regulatory data, Fin-GPT datasets, and financial Q&A data. The pipeline applied, but was not limited to, language filtering, semantic screening, and deduplication, resulting in a 30,000-example dataset combining financial regulations and general financial data. Using this dataset, we fine-tuned Llama 3.2-3B-Instruct to create Reg-LLaMA, a specialized model that outperformed baselines on the Regulations Challenge and PIXIU datasets. These results demonstrate the effectiveness of domain-specific data construction in advancing LLMs for regulatory tasks, paving the way for reliable and interpretable AI in regulated industries.