Milind Savagaonkar

2023

pdf bib abs
Bias Detection Using Textual Representation of Multimedia Contents
Karthik L. Nagar | Aditya Mohan Singh | Sowmya Rasipuram | Roshni Ramnani | Milind Savagaonkar | Anutosh Maitra
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

The presence of biased and prejudicial content in social media has become a pressing concern, given its potential to inflict severe societal damage. Detecting and addressing such bias is imperative, as the rapid dissemination of skewed content has the capacity to disrupt social harmony. Advanced deep learning models are now paving the way for the automatic detection of bias in multimedia content with human-like accuracy. This paper focuses on identifying social bias in social media images. Toward this, we curated a Social Bias Image Dataset (SBID), consisting of 300 bias/no-bias images. The images contain both textual and visual information. We scientifically annotated the dataset for four different categories of bias. Our methodology involves generating a textual representation of the image content leveraging state-of-the-art models of optical character recognition (OCR), image captioning, and character attribute extraction. Initially, we performed fine-tuning on a Bidirectional Encoder Representations from Transformers (BERT) network to classify bias and no-bias, as well as on a Bidirectional AutoRegressive Transformer (BART) network for bias categorization, utilizing an extensive textual corpus. Further, these networks were finetuned on the image dataset built by us SBID. The experimental findings presented herein underscore the effectiveness of these models in identifying various forms of bias in social media images. We will also demonstrate their capacity to discern both explicit and implicit bias.

2022

Movies reflect society and also hold power to transform opinions. Social biases and stereotypes present in movies can cause extensive damage due to their reach. These biases are not always found to be the need of storyline but can creep in as the author’s bias. Movie production houses would prefer to ascertain that the bias present in a script is the story’s demand. Today, when deep learning models can give human-level accuracy in multiple tasks, having an AI solution to identify the biases present in the script at the writing stage can help them avoid the inconvenience of stalled release, lawsuits, etc. Since AI solutions are data intensive and there exists no domain specific data to address the problem of biases in scripts, we introduce a new dataset of movie scripts that are annotated for identity bias. The dataset contains dialogue turns annotated for (i) bias labels for seven categories, viz., gender, race/ethnicity, religion, age, occupation, LGBTQ, and other, which contains biases like body shaming, personality bias, etc. (ii) labels for sensitivity, stereotype, sentiment, emotion, emotion intensity, (iii) all labels annotated with context awareness, (iv) target groups and reason for bias labels and (v) expert-driven group-validation process for high quality annotations. We also report various baseline performances for bias identification and category detection on our dataset.

Co-authors

Shubhashis Sengupta 1

Sandhya Singh 1

Aditya Mohan Singh 1

Nidhi Sultan 1

Venues

icon1
lrec1

Fix data