An Overview of Fairness in Data – Illuminating the Bias in Data Pipeline

Senthil Kumar B.; Aravindan Chandrabose; Bharathi Raja Chakravarthi

An Overview of Fairness in Data – Illuminating the Bias in Data Pipeline

Senthil Kumar B, Aravindan Chandrabose, Bharathi Raja Chakravarthi

Abstract

Data in general encodes human biases by default; being aware of this is a good start, and the research around how to handle it is ongoing. The term ‘bias’ is extensively used in various contexts in NLP systems. In our research the focus is specific to biases such as gender, racism, religion, demographic and other intersectional views on biases that prevail in text processing systems responsible for systematically discriminating specific population, which is not ethical in NLP. These biases exacerbate the lack of equality, diversity and inclusion of specific population while utilizing the NLP applications. The tools and technology at the intermediate level utilize biased data, and transfer or amplify this bias to the downstream applications. However, it is not enough to be colourblind, gender-neutral alone when designing a unbiased technology – instead, we should take a conscious effort by designing a unified framework to measure and benchmark the bias. In this paper, we recommend six measures and one augment measure based on the observations of the bias in data, annotations, text representations and debiasing techniques.

Anthology ID:: 2021.ltedi-1.5
Volume:: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion
Month:: April
Year:: 2021
Address:: Kyiv
Editors:: Bharathi Raja Chakravarthi, John P. McCrae, Manel Zarrouk, Kalika Bali, Paul Buitelaar
Venue:: LTEDI
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34–45
Language:
URL:: https://aclanthology.org/2021.ltedi-1.5
DOI:
Bibkey:
Cite (ACL):: Senthil Kumar B, Aravindan Chandrabose, and Bharathi Raja Chakravarthi. 2021. An Overview of Fairness in Data – Illuminating the Bias in Data Pipeline. In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion, pages 34–45, Kyiv. Association for Computational Linguistics.
Cite (Informal):: An Overview of Fairness in Data – Illuminating the Bias in Data Pipeline (B et al., LTEDI 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.ltedi-1.5.pdf
Data: WinoBias

PDF Cite Search