At the Intersection of NLP and Sustainable Development: Exploring the Impact of Demographic-Aware Text Representations in Modeling Value on a Corpus of Interviews

Goya van Boven, Stephanie Hirmer, Costanza Conforti


Abstract
This research explores automated text classification using data from Low– and Middle–Income Countries (LMICs). In particular, we explore enhancing text representations with demographic information of speakers in a privacy-preserving manner. We introduce the Demographic-Rich Qualitative UPV-Interviews Dataset (DR-QI), a rich dataset of qualitative interviews from rural communities in India and Uganda. The interviews were conducted following the latest standards for respectful interactions with illiterate speakers (Hirmer et al., 2021a). The interviews were later sentence-annotated for Automated User-Perceived Value (UPV) Classification (Conforti et al., 2020), a schema that classifies values expressed by speakers, resulting in a dataset of 5,333 sentences. We perform the UPV classification task, which consists of predicting which values are expressed in a given sentence, on the new DR-QI dataset. We implement a classification model using DistilBERT (Sanh et al., 2019), which we extend with demographic information. In order to preserve the privacy of speakers, we investigate encoding demographic information using autoencoders. We find that adding demographic information improves performance, even if such information is encoded. In addition, we find that the performance per UPV is linked to the number of occurrences of that value in our data.
Anthology ID:
2022.lrec-1.216
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2007–2021
Language:
URL:
https://aclanthology.org/2022.lrec-1.216
DOI:
Bibkey:
Cite (ACL):
Goya van Boven, Stephanie Hirmer, and Costanza Conforti. 2022. At the Intersection of NLP and Sustainable Development: Exploring the Impact of Demographic-Aware Text Representations in Modeling Value on a Corpus of Interviews. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2007–2021, Marseille, France. European Language Resources Association.
Cite (Informal):
At the Intersection of NLP and Sustainable Development: Exploring the Impact of Demographic-Aware Text Representations in Modeling Value on a Corpus of Interviews (van Boven et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.216.pdf
Code
 gvanboven/dr-qi