Keito Inoshita
2024
Corpus Development Based on Conflict Structures in the Security Field and LLM Bias Verification
Keito Inoshita
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
This study investigates the presence of biases in large language models (LLMs), specifically focusing on how these models process and reflect inter-state conflict structures. Previous research has often lacked the standardized datasets necessary for a thorough and consistent evaluation of biases in this context. Without such datasets, it is challenging to accurately assess the impact of these biases on critical applications. To address this gap, we developed a diverse and high-quality corpus using a four-phase process. This process included generating texts based on international conflict-related keywords, enhancing emotional diversity to capture a broad spectrum of sentiments, validating the coherence and connections between texts, and conducting final quality assurance through human reviewers who are experts in natural language processing. Our analysis, conducted using this newly developed corpus, revealed subtle but significant negative biases in LLMs, particularly towards Eastern bloc countries such as Russia and China. These biases have the potential to influence decision-making processes in fields like national security and international relations, where accurate, unbiased information is crucial. The findings underscore the importance of evaluating and mitigating these biases to ensure the reliability and fairness of LLMs when applied in sensitive areas.