Exploring Multilingual Concepts of Human Values in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?

Shaoyang Xu, Weilong Dong, Zishan Guo, Xinwei Wu, Deyi Xiong


Abstract
Prior research has revealed that certain abstract concepts are linearly represented as directions in the representation space of LLMs, predominantly centered around English. In this paper, we extend this investigation to a multilingual context, with a specific focus on human values-related concepts (i.e., value concepts) due to their significance for AI safety. Through our comprehensive exploration covering 7 types of human values, 16 languages and 3 LLM series with distinct multilinguality (e.g., monolingual, bilingual and multilingual), we first empirically confirm the presence of value concepts within LLMs in a multilingual format. Further analysis on the cross-lingual characteristics of these concepts reveals 3 traits arising from language resource disparities: cross-lingual inconsistency, distorted linguistic relationships, and unidirectional cross-lingual transfer between high- and low-resource languages, all in terms of value concepts. Moreover, we validate the feasibility of cross-lingual control over value alignment capabilities of LLMs, leveraging the dominant language as a source language. Ultimately, recognizing the significant impact of LLMs’ multilinguality on our results, we consolidate our findings and provide prudent suggestions on the composition of multilingual data for LLMs pre-training.
Anthology ID:
2024.findings-emnlp.96
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1771–1793
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.96
DOI:
Bibkey:
Cite (ACL):
Shaoyang Xu, Weilong Dong, Zishan Guo, Xinwei Wu, and Deyi Xiong. 2024. Exploring Multilingual Concepts of Human Values in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1771–1793, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Exploring Multilingual Concepts of Human Values in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages? (Xu et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.96.pdf
Software:
 2024.findings-emnlp.96.software.zip
Data:
 2024.findings-emnlp.96.data.zip