Understanding Narratives from Demographic Survey Data: a Comparative Study with Multiple Neural Topic Models

Xiao Xu, Gert Stulp, Antal Van Den Bosch, Anne Gauthier


Abstract
Fertility intentions as verbalized in surveys are a poor predictor of actual fertility outcomes, the number of children people have. This can partly be explained by the uncertainty people have in their intentions. Such uncertainties are hard to capture through traditional survey questions, although open-ended questions can be used to get insight into people’s subjective narratives of the future that determine their intentions. Analyzing such answers to open-ended questions can be done through Natural Language Processing techniques. Traditional topic models (e.g., LSA and LDA), however, often fail to do since they rely on co-occurrences, which are often rare in short survey responses. The aim of this study was to apply and evaluate topic models on demographic survey data. In this study, we applied neural topic models (e.g. BERTopic, CombinedTM) based on language models to responses from Dutch women on their fertility plans, and compared the topics and their coherence scores from each model to expert judgments. Our results show that neural models produce topics more in line with human interpretation compared to LDA. However, the coherence score could only partly reflect on this, depending on the corpus used for calculation. This research is important because, first, it helps us develop more informed strategies on model selection and evaluation for topic modeling on survey data; and second, it shows that the field of demography has much to gain from adopting NLP methods.
Anthology ID:
2022.nlpcss-1.4
Volume:
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)
Month:
November
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
David Bamman, Dirk Hovy, David Jurgens, Katherine Keith, Brendan O'Connor, Svitlana Volkova
Venue:
NLP+CSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
33–38
Language:
URL:
https://aclanthology.org/2022.nlpcss-1.4
DOI:
10.18653/v1/2022.nlpcss-1.4
Bibkey:
Cite (ACL):
Xiao Xu, Gert Stulp, Antal Van Den Bosch, and Anne Gauthier. 2022. Understanding Narratives from Demographic Survey Data: a Comparative Study with Multiple Neural Topic Models. In Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS), pages 33–38, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Understanding Narratives from Demographic Survey Data: a Comparative Study with Multiple Neural Topic Models (Xu et al., NLP+CSS 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nlpcss-1.4.pdf