Improving Cross-Lingual CSR Classification Using Pretrained Transformers with Variable Selection Networks and Data Augmentation

Shubham Sharma, Himanshu Janbandhu, Ankush Chopra


Abstract
This paper describes our submission to the Cross-Lingual Classification of Corporate Social Responsibility (CSR) Themes and Topics shared task, aiming to identify themes and fine-grained topics present in news articles. Classifying news articles poses several challenges, including limited training data, noisy articles, and longer context length. In this paper, we explore the potential of using pretrained transformer models to classify news articles into CSR themes and fine-grained topics. We propose two different approaches for these tasks. For multi-class classification of CSR themes, we suggest using a pretrained multi-lingual encoder-based model like microsoft/mDeBERTa-v3-base, along with a variable selection network to classify the article into CSR themes. To identify all fine-grained topics in each article, we propose using a pretrained encoder-based model like Longformer, which offers a higher context length. We employ chunking-based inference to avoid information loss in inference and experimented with using different parts and manifestation of original article for training and inference.
Anthology ID:
2024.finnlp-1.34
Volume:
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Chung-Chi Chen, Xiaomo Liu, Udo Hahn, Armineh Nourbakhsh, Zhiqiang Ma, Charese Smiley, Veronique Hoste, Sanjiv Ranjan Das, Manling Li, Mohammad Ghassemi, Hen-Hsen Huang, Hiroya Takamura, Hsin-Hsi Chen
Venues:
FinNLP | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
306–318
Language:
URL:
https://aclanthology.org/2024.finnlp-1.34
DOI:
Bibkey:
Cite (ACL):
Shubham Sharma, Himanshu Janbandhu, and Ankush Chopra. 2024. Improving Cross-Lingual CSR Classification Using Pretrained Transformers with Variable Selection Networks and Data Augmentation. In Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing @ LREC-COLING 2024, pages 306–318, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Improving Cross-Lingual CSR Classification Using Pretrained Transformers with Variable Selection Networks and Data Augmentation (Sharma et al., FinNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.finnlp-1.34.pdf