Bias Attribution in Filipino Language Models: Extending a Bias Interpretability Metric for Application on Agglutinative Languages

Lance Calvin Lim Gamboa; Yue Feng; Mark Lee

doi:10.18653/v1/2025.gebnlp-1.19

Bias Attribution in Filipino Language Models: Extending a Bias Interpretability Metric for Application on Agglutinative Languages

Lance Calvin Lim Gamboa, Yue Feng, Mark G. Lee

Abstract

Emerging research on bias attribution and interpretability have revealed how tokens contribute to biased behavior in language models processing English texts. We build on this line of inquiry by adapting the information-theoretic bias attribution score metric for implementation on models handling agglutinative languages—particularly Filipino. We then demonstrate the effectiveness of our adapted method by using it on a purely Filipino model and on three multilingual models—one trained on languages worldwide and two on Southeast Asian data. Our results show that Filipino models are driven towards bias by words pertaining to people, objects, and relationships—entity-based themes that stand in contrast to the action-heavy nature of bias-contributing themes in English (i.e., criminal, sexual, and prosocial behaviors). These findings point to differences in how English and non-English models process inputs linked to sociodemographic groups and bias.

Anthology ID:: 2025.gebnlp-1.19
Volume:: Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Karolina Stańczak, Debora Nozza
Venues:: GeBNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 195–205
Language:
URL:: https://aclanthology.org/2025.gebnlp-1.19/
DOI:: 10.18653/v1/2025.gebnlp-1.19
Bibkey:
Cite (ACL):: Lance Calvin Lim Gamboa, Yue Feng, and Mark G. Lee. 2025. Bias Attribution in Filipino Language Models: Extending a Bias Interpretability Metric for Application on Agglutinative Languages. In Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 195–205, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Bias Attribution in Filipino Language Models: Extending a Bias Interpretability Metric for Application on Agglutinative Languages (Gamboa et al., GeBNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.gebnlp-1.19.pdf

PDF Cite Search Fix data