Recognition of non-domain phrases in automatically extracted lists of terms

Agnieszka Mykowiecka; Malgorzata Marciniak; Piotr Rychlik

Recognition of non-domain phrases in automatically extracted lists of terms

Agnieszka Mykowiecka, Malgorzata Marciniak, Piotr Rychlik

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

In the paper, we address the problem of recognition of non-domain phrases in terminology lists obtained with an automatic term extraction tool. We focus on identification of multi-word phrases that are general terms and discourse function expressions. We tested several methods based on domain corpora comparison and a method based on contexts of phrases identified in a large corpus of general language. We compared the results of the methods to manual annotation. The results show that the task is quite hard as the inter-annotator agreement is low. Several tested methods achieved similar overall results, although the phrase ordering varied between methods. The most successful method with the precision about 0.75 at the half of the tested list was the context based method using a modified contextual diversity coefficient.

Anthology ID:: W16-4703
Volume:: Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)
Month:: December
Year:: 2016
Address:: Osaka, Japan
Editors:: Patrick Drouin, Natalia Grabar, Thierry Hamon, Kyo Kageura, Koichi Takeuchi
Venue:: CompuTerm
SIG:
Publisher:: The COLING 2016 Organizing Committee
Note:
Pages:: 12–20
Language:
URL:: https://aclanthology.org/W16-4703/
DOI:
Bibkey:
Cite (ACL):: Agnieszka Mykowiecka, Malgorzata Marciniak, and Piotr Rychlik. 2016. Recognition of non-domain phrases in automatically extracted lists of terms. In Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016), pages 12–20, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):: Recognition of non-domain phrases in automatically extracted lists of terms (Mykowiecka et al., CompuTerm 2016)
Copy Citation:
PDF:: https://aclanthology.org/W16-4703.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{mykowiecka-etal-2016-recognition,
    title = "Recognition of non-domain phrases in automatically extracted lists of terms",
    author = "Mykowiecka, Agnieszka  and
      Marciniak, Malgorzata  and
      Rychlik, Piotr",
    editor = "Drouin, Patrick  and
      Grabar, Natalia  and
      Hamon, Thierry  and
      Kageura, Kyo  and
      Takeuchi, Koichi",
    booktitle = "Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)",
    month = dec,
    year = "2016",
    address = "Osaka, Japan",
    publisher = "The COLING 2016 Organizing Committee",
    url = "https://aclanthology.org/W16-4703/",
    pages = "12--20",
    abstract = "In the paper, we address the problem of recognition of non-domain phrases in terminology lists obtained with an automatic term extraction tool. We focus on identification of multi-word phrases that are general terms and discourse function expressions. We tested several methods based on domain corpora comparison and a method based on contexts of phrases identified in a large corpus of general language. We compared the results of the methods to manual annotation. The results show that the task is quite hard as the inter-annotator agreement is low. Several tested methods achieved similar overall results, although the phrase ordering varied between methods. The most successful method with the precision about 0.75 at the half of the tested list was the context based method using a modified contextual diversity coefficient."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="mykowiecka-etal-2016-recognition">
    <titleInfo>
        <title>Recognition of non-domain phrases in automatically extracted lists of terms</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Agnieszka</namePart>
        <namePart type="family">Mykowiecka</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Malgorzata</namePart>
        <namePart type="family">Marciniak</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Piotr</namePart>
        <namePart type="family">Rychlik</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2016-12</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Patrick</namePart>
            <namePart type="family">Drouin</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Natalia</namePart>
            <namePart type="family">Grabar</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Thierry</namePart>
            <namePart type="family">Hamon</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Kyo</namePart>
            <namePart type="family">Kageura</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Koichi</namePart>
            <namePart type="family">Takeuchi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>The COLING 2016 Organizing Committee</publisher>
            <place>
                <placeTerm type="text">Osaka, Japan</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>In the paper, we address the problem of recognition of non-domain phrases in terminology lists obtained with an automatic term extraction tool. We focus on identification of multi-word phrases that are general terms and discourse function expressions. We tested several methods based on domain corpora comparison and a method based on contexts of phrases identified in a large corpus of general language. We compared the results of the methods to manual annotation. The results show that the task is quite hard as the inter-annotator agreement is low. Several tested methods achieved similar overall results, although the phrase ordering varied between methods. The most successful method with the precision about 0.75 at the half of the tested list was the context based method using a modified contextual diversity coefficient.</abstract>
    <identifier type="citekey">mykowiecka-etal-2016-recognition</identifier>
    <location>
        <url>https://aclanthology.org/W16-4703/</url>
    </location>
    <part>
        <date>2016-12</date>
        <extent unit="page">
            <start>12</start>
            <end>20</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Recognition of non-domain phrases in automatically extracted lists of terms
%A Mykowiecka, Agnieszka
%A Marciniak, Malgorzata
%A Rychlik, Piotr
%Y Drouin, Patrick
%Y Grabar, Natalia
%Y Hamon, Thierry
%Y Kageura, Kyo
%Y Takeuchi, Koichi
%S Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)
%D 2016
%8 December
%I The COLING 2016 Organizing Committee
%C Osaka, Japan
%F mykowiecka-etal-2016-recognition
%X In the paper, we address the problem of recognition of non-domain phrases in terminology lists obtained with an automatic term extraction tool. We focus on identification of multi-word phrases that are general terms and discourse function expressions. We tested several methods based on domain corpora comparison and a method based on contexts of phrases identified in a large corpus of general language. We compared the results of the methods to manual annotation. The results show that the task is quite hard as the inter-annotator agreement is low. Several tested methods achieved similar overall results, although the phrase ordering varied between methods. The most successful method with the precision about 0.75 at the half of the tested list was the context based method using a modified contextual diversity coefficient.
%U https://aclanthology.org/W16-4703/
%P 12-20

Download as File

Markdown (Informal)

[Recognition of non-domain phrases in automatically extracted lists of terms](https://aclanthology.org/W16-4703/) (Mykowiecka et al., CompuTerm 2016)

Recognition of non-domain phrases in automatically extracted lists of terms (Mykowiecka et al., CompuTerm 2016)

ACL

Agnieszka Mykowiecka, Malgorzata Marciniak, and Piotr Rychlik. 2016. Recognition of non-domain phrases in automatically extracted lists of terms. In Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016), pages 12–20, Osaka, Japan. The COLING 2016 Organizing Committee.