Mining Social Science Publications for Survey Variables

Andrea Zielinski, Peter Mutschke


Abstract
Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often difficult. In this paper, we present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms, using a linguistic analysis pipeline to extract a rich feature set, including terminological concepts and similarity metric scores. Further, we present preliminary results on a small dataset that has been specifically designed for this task, yielding a significant increase in performance over the random baseline.
Anthology ID:
W17-2907
Volume:
Proceedings of the Second Workshop on NLP and Computational Social Science
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Dirk Hovy, Svitlana Volkova, David Bamman, David Jurgens, Brendan O’Connor, Oren Tsur, A. Seza Doğruöz
Venue:
NLP+CSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
47–52
Language:
URL:
https://aclanthology.org/W17-2907
DOI:
10.18653/v1/W17-2907
Bibkey:
Cite (ACL):
Andrea Zielinski and Peter Mutschke. 2017. Mining Social Science Publications for Survey Variables. In Proceedings of the Second Workshop on NLP and Computational Social Science, pages 47–52, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Mining Social Science Publications for Survey Variables (Zielinski & Mutschke, NLP+CSS 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2907.pdf