Corpus Development Based on Conflict Structures in the Security Field and LLM Bias Verification

Keito Inoshita

doi:10.18653/v1/2024.nlp4dh-1.49

Corpus Development Based on Conflict Structures in the Security Field and LLM Bias Verification

Abstract

This study investigates the presence of biases in large language models (LLMs), specifically focusing on how these models process and reflect inter-state conflict structures. Previous research has often lacked the standardized datasets necessary for a thorough and consistent evaluation of biases in this context. Without such datasets, it is challenging to accurately assess the impact of these biases on critical applications. To address this gap, we developed a diverse and high-quality corpus using a four-phase process. This process included generating texts based on international conflict-related keywords, enhancing emotional diversity to capture a broad spectrum of sentiments, validating the coherence and connections between texts, and conducting final quality assurance through human reviewers who are experts in natural language processing. Our analysis, conducted using this newly developed corpus, revealed subtle but significant negative biases in LLMs, particularly towards Eastern bloc countries such as Russia and China. These biases have the potential to influence decision-making processes in fields like national security and international relations, where accurate, unbiased information is crucial. The findings underscore the importance of evaluating and mitigating these biases to ensure the reliability and fairness of LLMs when applied in sensitive areas.

Anthology ID:: 2024.nlp4dh-1.49
Volume:: Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Month:: November
Year:: 2024
Address:: Miami, USA
Editors:: Mika Hämäläinen, Emily Öhman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
Venues:: NLP4DH | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 504–512
Language:
URL:: https://aclanthology.org/2024.nlp4dh-1.49/
DOI:: 10.18653/v1/2024.nlp4dh-1.49
Bibkey:
Cite (ACL):: Keito Inoshita. 2024. Corpus Development Based on Conflict Structures in the Security Field and LLM Bias Verification. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pages 504–512, Miami, USA. Association for Computational Linguistics.
Cite (Informal):: Corpus Development Based on Conflict Structures in the Security Field and LLM Bias Verification (Inoshita, NLP4DH 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.nlp4dh-1.49.pdf

PDF Cite Search Fix data