Learning to Define Terms in the Software Domain

Vidhisha Balachandran; Dheeraj Rajagopal; Rose Catherine Kanjirathinkal; William Cohen

doi:10.18653/v1/W18-6122

Learning to Define Terms in the Software Domain

Vidhisha Balachandran, Dheeraj Rajagopal, Rose Catherine Kanjirathinkal, William Cohen

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

One way to test a person’s knowledge of a domain is to ask them to define domain-specific terms. Here, we investigate the task of automatically generating definitions of technical terms by reading text from the technical domain. Specifically, we learn definitions of software entities from a large corpus built from the user forum Stack Overflow. To model definitions, we train a language model and incorporate additional domain-specific information like word co-occurrence, and ontological category information. Our approach improves previous baselines by 2 BLEU points for the definition generation task. Our experiments also show the additional challenges associated with the task and the short-comings of language-model based architectures for definition generation.

Anthology ID:: W18-6122
Volume:: Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
Month:: November
Year:: 2018
Address:: Brussels, Belgium
Editors:: Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:: WNUT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 164–172
Language:
URL:: https://aclanthology.org/W18-6122/
DOI:: 10.18653/v1/W18-6122
Bibkey:
Cite (ACL):: Vidhisha Balachandran, Dheeraj Rajagopal, Rose Catherine Kanjirathinkal, and William Cohen. 2018. Learning to Define Terms in the Software Domain. In Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pages 164–172, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Learning to Define Terms in the Software Domain (Balachandran et al., WNUT 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-6122.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{balachandran-etal-2018-learning,
    title = "Learning to Define Terms in the Software Domain",
    author = "Balachandran, Vidhisha  and
      Rajagopal, Dheeraj  and
      Kanjirathinkal, Rose Catherine  and
      Cohen, William",
    editor = "Xu, Wei  and
      Ritter, Alan  and
      Baldwin, Tim  and
      Rahimi, Afshin",
    booktitle = "Proceedings of the 2018 {EMNLP} Workshop W-{NUT}: The 4th Workshop on Noisy User-generated Text",
    month = nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W18-6122/",
    doi = "10.18653/v1/W18-6122",
    pages = "164--172",
    abstract = "One way to test a person{'}s knowledge of a domain is to ask them to define domain-specific terms. Here, we investigate the task of automatically generating definitions of technical terms by reading text from the technical domain. Specifically, we learn definitions of software entities from a large corpus built from the user forum Stack Overflow. To model definitions, we train a language model and incorporate additional domain-specific information like word co-occurrence, and ontological category information. Our approach improves previous baselines by 2 BLEU points for the definition generation task. Our experiments also show the additional challenges associated with the task and the short-comings of language-model based architectures for definition generation."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="balachandran-etal-2018-learning">
    <titleInfo>
        <title>Learning to Define Terms in the Software Domain</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Vidhisha</namePart>
        <namePart type="family">Balachandran</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Dheeraj</namePart>
        <namePart type="family">Rajagopal</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Rose</namePart>
        <namePart type="given">Catherine</namePart>
        <namePart type="family">Kanjirathinkal</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">William</namePart>
        <namePart type="family">Cohen</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2018-11</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Wei</namePart>
            <namePart type="family">Xu</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Alan</namePart>
            <namePart type="family">Ritter</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Tim</namePart>
            <namePart type="family">Baldwin</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Afshin</namePart>
            <namePart type="family">Rahimi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Brussels, Belgium</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>One way to test a person’s knowledge of a domain is to ask them to define domain-specific terms. Here, we investigate the task of automatically generating definitions of technical terms by reading text from the technical domain. Specifically, we learn definitions of software entities from a large corpus built from the user forum Stack Overflow. To model definitions, we train a language model and incorporate additional domain-specific information like word co-occurrence, and ontological category information. Our approach improves previous baselines by 2 BLEU points for the definition generation task. Our experiments also show the additional challenges associated with the task and the short-comings of language-model based architectures for definition generation.</abstract>
    <identifier type="citekey">balachandran-etal-2018-learning</identifier>
    <identifier type="doi">10.18653/v1/W18-6122</identifier>
    <location>
        <url>https://aclanthology.org/W18-6122/</url>
    </location>
    <part>
        <date>2018-11</date>
        <extent unit="page">
            <start>164</start>
            <end>172</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Learning to Define Terms in the Software Domain
%A Balachandran, Vidhisha
%A Rajagopal, Dheeraj
%A Kanjirathinkal, Rose Catherine
%A Cohen, William
%Y Xu, Wei
%Y Ritter, Alan
%Y Baldwin, Tim
%Y Rahimi, Afshin
%S Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
%D 2018
%8 November
%I Association for Computational Linguistics
%C Brussels, Belgium
%F balachandran-etal-2018-learning
%X One way to test a person’s knowledge of a domain is to ask them to define domain-specific terms. Here, we investigate the task of automatically generating definitions of technical terms by reading text from the technical domain. Specifically, we learn definitions of software entities from a large corpus built from the user forum Stack Overflow. To model definitions, we train a language model and incorporate additional domain-specific information like word co-occurrence, and ontological category information. Our approach improves previous baselines by 2 BLEU points for the definition generation task. Our experiments also show the additional challenges associated with the task and the short-comings of language-model based architectures for definition generation.
%R 10.18653/v1/W18-6122
%U https://aclanthology.org/W18-6122/
%U https://doi.org/10.18653/v1/W18-6122
%P 164-172

Download as File

Markdown (Informal)

[Learning to Define Terms in the Software Domain](https://aclanthology.org/W18-6122/) (Balachandran et al., WNUT 2018)

Learning to Define Terms in the Software Domain (Balachandran et al., WNUT 2018)

ACL

Vidhisha Balachandran, Dheeraj Rajagopal, Rose Catherine Kanjirathinkal, and William Cohen. 2018. Learning to Define Terms in the Software Domain. In Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pages 164–172, Brussels, Belgium. Association for Computational Linguistics.