Mining Social Science Publications for Survey Variables

Andrea Zielinski; Peter Mutschke

doi:10.18653/v1/W17-2907

Mining Social Science Publications for Survey Variables

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use ... for bold, ... for italic, ... for underline, <sc>...</sc> for small-caps, <tt>...<tt> for typewriter text, <url>...</url> for URLs, <a href=...> for hyperlinks, and <par/> for paragraph breaks.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often difficult. In this paper, we present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms, using a linguistic analysis pipeline to extract a rich feature set, including terminological concepts and similarity metric scores. Further, we present preliminary results on a small dataset that has been specifically designed for this task, yielding a significant increase in performance over the random baseline.

Anthology ID:: W17-2907
Volume:: Proceedings of the Second Workshop on NLP and Computational Social Science
Month:: August
Year:: 2017
Address:: Vancouver, Canada
Editors:: Dirk Hovy, Svitlana Volkova, David Bamman, David Jurgens, Brendan O’Connor, Oren Tsur, A. Seza Doğruöz
Venue:: NLP+CSS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 47–52
Language:
URL:: https://aclanthology.org/W17-2907/
DOI:: 10.18653/v1/W17-2907
Bibkey:
Cite (ACL):: Andrea Zielinski and Peter Mutschke. 2017. Mining Social Science Publications for Survey Variables. In Proceedings of the Second Workshop on NLP and Computational Social Science, pages 47–52, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):: Mining Social Science Publications for Survey Variables (Zielinski & Mutschke, NLP+CSS 2017)
Copy Citation:
PDF:: https://aclanthology.org/W17-2907.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{zielinski-mutschke-2017-mining,
    title = "Mining Social Science Publications for Survey Variables",
    author = "Zielinski, Andrea  and
      Mutschke, Peter",
    editor = {Hovy, Dirk  and
      Volkova, Svitlana  and
      Bamman, David  and
      Jurgens, David  and
      O{'}Connor, Brendan  and
      Tsur, Oren  and
      Do{\u{g}}ru{\"o}z, A. Seza},
    booktitle = "Proceedings of the Second Workshop on {NLP} and Computational Social Science",
    month = aug,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W17-2907/",
    doi = "10.18653/v1/W17-2907",
    pages = "47--52",
    abstract = "Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often difficult. In this paper, we present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms, using a linguistic analysis pipeline to extract a rich feature set, including terminological concepts and similarity metric scores. Further, we present preliminary results on a small dataset that has been specifically designed for this task, yielding a significant increase in performance over the random baseline."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="zielinski-mutschke-2017-mining">
    <titleInfo>
        <title>Mining Social Science Publications for Survey Variables</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Andrea</namePart>
        <namePart type="family">Zielinski</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Peter</namePart>
        <namePart type="family">Mutschke</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2017-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Second Workshop on NLP and Computational Social Science</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Dirk</namePart>
            <namePart type="family">Hovy</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Svitlana</namePart>
            <namePart type="family">Volkova</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">David</namePart>
            <namePart type="family">Bamman</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">David</namePart>
            <namePart type="family">Jurgens</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Brendan</namePart>
            <namePart type="family">O’Connor</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Oren</namePart>
            <namePart type="family">Tsur</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">A</namePart>
            <namePart type="given">Seza</namePart>
            <namePart type="family">Doğruöz</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Vancouver, Canada</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often difficult. In this paper, we present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms, using a linguistic analysis pipeline to extract a rich feature set, including terminological concepts and similarity metric scores. Further, we present preliminary results on a small dataset that has been specifically designed for this task, yielding a significant increase in performance over the random baseline.</abstract>
    <identifier type="citekey">zielinski-mutschke-2017-mining</identifier>
    <identifier type="doi">10.18653/v1/W17-2907</identifier>
    <location>
        <url>https://aclanthology.org/W17-2907/</url>
    </location>
    <part>
        <date>2017-08</date>
        <extent unit="page">
            <start>47</start>
            <end>52</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Mining Social Science Publications for Survey Variables
%A Zielinski, Andrea
%A Mutschke, Peter
%Y Hovy, Dirk
%Y Volkova, Svitlana
%Y Bamman, David
%Y Jurgens, David
%Y O’Connor, Brendan
%Y Tsur, Oren
%Y Doğruöz, A. Seza
%S Proceedings of the Second Workshop on NLP and Computational Social Science
%D 2017
%8 August
%I Association for Computational Linguistics
%C Vancouver, Canada
%F zielinski-mutschke-2017-mining
%X Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often difficult. In this paper, we present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms, using a linguistic analysis pipeline to extract a rich feature set, including terminological concepts and similarity metric scores. Further, we present preliminary results on a small dataset that has been specifically designed for this task, yielding a significant increase in performance over the random baseline.
%R 10.18653/v1/W17-2907
%U https://aclanthology.org/W17-2907/
%U https://doi.org/10.18653/v1/W17-2907
%P 47-52

Download as File

Markdown (Informal)

[Mining Social Science Publications for Survey Variables](https://aclanthology.org/W17-2907/) (Zielinski & Mutschke, NLP+CSS 2017)

Mining Social Science Publications for Survey Variables (Zielinski & Mutschke, NLP+CSS 2017)

ACL

Andrea Zielinski and Peter Mutschke. 2017. Mining Social Science Publications for Survey Variables. In Proceedings of the Second Workshop on NLP and Computational Social Science, pages 47–52, Vancouver, Canada. Association for Computational Linguistics.