On the Interpretability and Significance of Bias Metrics in Texts: a PMI-based Approach

Francisco Valentini, Germán Rosati, Damián Blasi, Diego Fernandez Slezak, Edgar Altszyler


Abstract
In recent years, word embeddings have been widely used to measure biases in texts. Even if they have proven to be effective in detecting a wide variety of biases, metrics based on word embeddings lack transparency and interpretability. We analyze an alternative PMI-based metric to quantify biases in texts. It can be expressed as a function of conditional probabilities, which provides a simple interpretation in terms of word co-occurrences. We also prove that it can be approximated by an odds ratio, which allows estimating confidence intervals and statistical significance of textual biases. This approach produces similar results to metrics based on word embeddings when capturing gender gaps of the real world embedded in large corpora.
Anthology ID:
2023.acl-short.44
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
509–520
Language:
URL:
https://aclanthology.org/2023.acl-short.44
DOI:
10.18653/v1/2023.acl-short.44
Bibkey:
Cite (ACL):
Francisco Valentini, Germán Rosati, Damián Blasi, Diego Fernandez Slezak, and Edgar Altszyler. 2023. On the Interpretability and Significance of Bias Metrics in Texts: a PMI-based Approach. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 509–520, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
On the Interpretability and Significance of Bias Metrics in Texts: a PMI-based Approach (Valentini et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-short.44.pdf
Video:
 https://aclanthology.org/2023.acl-short.44.mp4