Bad Seeds: Evaluating Lexical Methods for Bias Measurement

Maria Antoniak, David Mimno


Abstract
A common factor in bias measurement methods is the use of hand-curated seed lexicons, but there remains little guidance for their selection. We gather seeds used in prior work, documenting their common sources and rationales, and in case studies of three English-language corpora, we enumerate the different types of social biases and linguistic features that, once encoded in the seeds, can affect subsequent bias measurements. Seeds developed in one context are often re-used in other contexts, but documentation and evaluation remain necessary precursors to relying on seeds for sensitive measurements.
Anthology ID:
2021.acl-long.148
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1889–1904
Language:
URL:
https://aclanthology.org/2021.acl-long.148
DOI:
10.18653/v1/2021.acl-long.148
Bibkey:
Cite (ACL):
Maria Antoniak and David Mimno. 2021. Bad Seeds: Evaluating Lexical Methods for Bias Measurement. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1889–1904, Online. Association for Computational Linguistics.
Cite (Informal):
Bad Seeds: Evaluating Lexical Methods for Bias Measurement (Antoniak & Mimno, ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-long.148.pdf
Optional supplementary material:
 2021.acl-long.148.OptionalSupplementaryMaterial.zip
Video:
 https://aclanthology.org/2021.acl-long.148.mp4
Code
 maria-antoniak/bad-seeds