Lexical Correction of Polish Twitter Political Data

Maciej Ogrodniczuk; Mateusz Kopeć

doi:10.18653/v1/W17-2215

Lexical Correction of Polish Twitter Political Data

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Language processing architectures are often evaluated in near-to-perfect conditions with respect to processed content. The tools which perform sufficiently well on electronic press, books and other type of non-interactive content may poorly handle littered, colloquial and multilingual textual data which make the majority of communication today. This paper aims at investigating how Polish Twitter data (in a slightly controlled ‘political’ flavour) differs from expectation of linguistic tools and how they could be corrected to be ready for processing by standard language processing chains available for Polish. The setting includes specialised components for spelling correction of tweets as well as hashtag and username decoding.

Anthology ID:: W17-2215
Volume:: Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:: August
Year:: 2017
Address:: Vancouver, Canada
Editors:: Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Feldman, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Venue:: LaTeCH
SIG:: SIGHUM
Publisher:: Association for Computational Linguistics
Note:
Pages:: 115–125
Language:
URL:: https://aclanthology.org/W17-2215/
DOI:: 10.18653/v1/W17-2215
Bibkey:
Cite (ACL):: Maciej Ogrodniczuk and Mateusz Kopeć. 2017. Lexical Correction of Polish Twitter Political Data. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 115–125, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):: Lexical Correction of Polish Twitter Political Data (Ogrodniczuk & Kopeć, LaTeCH 2017)
Copy Citation:
PDF:: https://aclanthology.org/W17-2215.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{ogrodniczuk-kopec-2017-lexical,
    title = "Lexical Correction of {P}olish {T}witter Political Data",
    author = "Ogrodniczuk, Maciej  and
      Kope{\'c}, Mateusz",
    editor = "Alex, Beatrice  and
      Degaetano-Ortlieb, Stefania  and
      Feldman, Anna  and
      Kazantseva, Anna  and
      Reiter, Nils  and
      Szpakowicz, Stan",
    booktitle = "Proceedings of the Joint {SIGHUM} Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature",
    month = aug,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W17-2215/",
    doi = "10.18653/v1/W17-2215",
    pages = "115--125",
    abstract = "Language processing architectures are often evaluated in near-to-perfect conditions with respect to processed content. The tools which perform sufficiently well on electronic press, books and other type of non-interactive content may poorly handle littered, colloquial and multilingual textual data which make the majority of communication today. This paper aims at investigating how Polish Twitter data (in a slightly controlled `political' flavour) differs from expectation of linguistic tools and how they could be corrected to be ready for processing by standard language processing chains available for Polish. The setting includes specialised components for spelling correction of tweets as well as hashtag and username decoding."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="ogrodniczuk-kopec-2017-lexical">
    <titleInfo>
        <title>Lexical Correction of Polish Twitter Political Data</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Maciej</namePart>
        <namePart type="family">Ogrodniczuk</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Mateusz</namePart>
        <namePart type="family">Kopeć</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2017-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Beatrice</namePart>
            <namePart type="family">Alex</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Stefania</namePart>
            <namePart type="family">Degaetano-Ortlieb</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Anna</namePart>
            <namePart type="family">Feldman</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Anna</namePart>
            <namePart type="family">Kazantseva</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Nils</namePart>
            <namePart type="family">Reiter</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Stan</namePart>
            <namePart type="family">Szpakowicz</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Vancouver, Canada</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Language processing architectures are often evaluated in near-to-perfect conditions with respect to processed content. The tools which perform sufficiently well on electronic press, books and other type of non-interactive content may poorly handle littered, colloquial and multilingual textual data which make the majority of communication today. This paper aims at investigating how Polish Twitter data (in a slightly controlled ‘political’ flavour) differs from expectation of linguistic tools and how they could be corrected to be ready for processing by standard language processing chains available for Polish. The setting includes specialised components for spelling correction of tweets as well as hashtag and username decoding.</abstract>
    <identifier type="citekey">ogrodniczuk-kopec-2017-lexical</identifier>
    <identifier type="doi">10.18653/v1/W17-2215</identifier>
    <location>
        <url>https://aclanthology.org/W17-2215/</url>
    </location>
    <part>
        <date>2017-08</date>
        <extent unit="page">
            <start>115</start>
            <end>125</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Lexical Correction of Polish Twitter Political Data
%A Ogrodniczuk, Maciej
%A Kopeć, Mateusz
%Y Alex, Beatrice
%Y Degaetano-Ortlieb, Stefania
%Y Feldman, Anna
%Y Kazantseva, Anna
%Y Reiter, Nils
%Y Szpakowicz, Stan
%S Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
%D 2017
%8 August
%I Association for Computational Linguistics
%C Vancouver, Canada
%F ogrodniczuk-kopec-2017-lexical
%X Language processing architectures are often evaluated in near-to-perfect conditions with respect to processed content. The tools which perform sufficiently well on electronic press, books and other type of non-interactive content may poorly handle littered, colloquial and multilingual textual data which make the majority of communication today. This paper aims at investigating how Polish Twitter data (in a slightly controlled ‘political’ flavour) differs from expectation of linguistic tools and how they could be corrected to be ready for processing by standard language processing chains available for Polish. The setting includes specialised components for spelling correction of tweets as well as hashtag and username decoding.
%R 10.18653/v1/W17-2215
%U https://aclanthology.org/W17-2215/
%U https://doi.org/10.18653/v1/W17-2215
%P 115-125

Download as File

Markdown (Informal)

[Lexical Correction of Polish Twitter Political Data](https://aclanthology.org/W17-2215/) (Ogrodniczuk & Kopeć, LaTeCH 2017)

Lexical Correction of Polish Twitter Political Data (Ogrodniczuk & Kopeć, LaTeCH 2017)

ACL

Maciej Ogrodniczuk and Mateusz Kopeć. 2017. Lexical Correction of Polish Twitter Political Data. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 115–125, Vancouver, Canada. Association for Computational Linguistics.