Advancing CSR Theme and Topic Classification: LLMs and Training Enhancement Insights

Jens Van Nooten, Andriy Kosar


Abstract
In this paper, we present our results of the classification of Corporate Social Responsibility (CSR) Themes and Topics shared task, which encompasses cross-lingual multi-class classification and monolingual multi-label classification. We examine the performance of multiple machine learning (ML) models, ranging from classical models to pre-trained large language models (LLMs), and assess the effectiveness of Data Augmentation (DA), Data Translation (DT), and Contrastive Learning (CL). We find that state-of-the-art generative LLMs in a zero-shot setup still fall behind on more complex classification tasks compared to fine-tuning local models with enhanced datasets and additional training objectives. Our work provides a wide array of comparisons and highlights the relevance of utilizing smaller language models for more complex classification tasks.
Anthology ID:
2024.finnlp-1.33
Volume:
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Chung-Chi Chen, Xiaomo Liu, Udo Hahn, Armineh Nourbakhsh, Zhiqiang Ma, Charese Smiley, Veronique Hoste, Sanjiv Ranjan Das, Manling Li, Mohammad Ghassemi, Hen-Hsen Huang, Hiroya Takamura, Hsin-Hsi Chen
Venues:
FinNLP | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
292–305
Language:
URL:
https://aclanthology.org/2024.finnlp-1.33
DOI:
Bibkey:
Cite (ACL):
Jens Van Nooten and Andriy Kosar. 2024. Advancing CSR Theme and Topic Classification: LLMs and Training Enhancement Insights. In Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing @ LREC-COLING 2024, pages 292–305, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Advancing CSR Theme and Topic Classification: LLMs and Training Enhancement Insights (Van Nooten & Kosar, FinNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.finnlp-1.33.pdf