Comparison of String Similarity Measures for Obscenity Filtering

Ekaterina Chernyak

doi:10.18653/v1/W17-1415

Comparison of String Similarity Measures for Obscenity Filtering

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

In this paper we address the problem of filtering obscene lexis in Russian texts. We use string similarity measures to find words similar or identical to words from a stop list and establish both a test collection and a baseline for the task. Our experiments show that a novel string similarity measure based on the notion of an annotated suffix tree outperforms some of the other well known measures.

Anthology ID:: W17-1415
Volume:: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing
Month:: April
Year:: 2017
Address:: Valencia, Spain
Editors:: Tomaž Erjavec, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
Venue:: BSNLP
SIG:: SIGSLAV
Publisher:: Association for Computational Linguistics
Note:
Pages:: 97–101
Language:
URL:: https://aclanthology.org/W17-1415/
DOI:: 10.18653/v1/W17-1415
Bibkey:
Cite (ACL):: Ekaterina Chernyak. 2017. Comparison of String Similarity Measures for Obscenity Filtering. In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pages 97–101, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):: Comparison of String Similarity Measures for Obscenity Filtering (Chernyak, BSNLP 2017)
Copy Citation:
PDF:: https://aclanthology.org/W17-1415.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{chernyak-2017-comparison,
    title = "Comparison of String Similarity Measures for Obscenity Filtering",
    author = "Chernyak, Ekaterina",
    editor = "Erjavec, Toma{\v{z}}  and
      Piskorski, Jakub  and
      Pivovarova, Lidia  and
      {\v{S}}najder, Jan  and
      Steinberger, Josef  and
      Yangarber, Roman",
    booktitle = "Proceedings of the 6th Workshop on {B}alto-{S}lavic Natural Language Processing",
    month = apr,
    year = "2017",
    address = "Valencia, Spain",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W17-1415/",
    doi = "10.18653/v1/W17-1415",
    pages = "97--101",
    abstract = "In this paper we address the problem of filtering obscene lexis in Russian texts. We use string similarity measures to find words similar or identical to words from a stop list and establish both a test collection and a baseline for the task. Our experiments show that a novel string similarity measure based on the notion of an annotated suffix tree outperforms some of the other well known measures."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="chernyak-2017-comparison">
    <titleInfo>
        <title>Comparison of String Similarity Measures for Obscenity Filtering</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Ekaterina</namePart>
        <namePart type="family">Chernyak</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2017-04</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Tomaž</namePart>
            <namePart type="family">Erjavec</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jakub</namePart>
            <namePart type="family">Piskorski</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Lidia</namePart>
            <namePart type="family">Pivovarova</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jan</namePart>
            <namePart type="family">Šnajder</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Josef</namePart>
            <namePart type="family">Steinberger</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Roman</namePart>
            <namePart type="family">Yangarber</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Valencia, Spain</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>In this paper we address the problem of filtering obscene lexis in Russian texts. We use string similarity measures to find words similar or identical to words from a stop list and establish both a test collection and a baseline for the task. Our experiments show that a novel string similarity measure based on the notion of an annotated suffix tree outperforms some of the other well known measures.</abstract>
    <identifier type="citekey">chernyak-2017-comparison</identifier>
    <identifier type="doi">10.18653/v1/W17-1415</identifier>
    <location>
        <url>https://aclanthology.org/W17-1415/</url>
    </location>
    <part>
        <date>2017-04</date>
        <extent unit="page">
            <start>97</start>
            <end>101</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Comparison of String Similarity Measures for Obscenity Filtering
%A Chernyak, Ekaterina
%Y Erjavec, Tomaž
%Y Piskorski, Jakub
%Y Pivovarova, Lidia
%Y Šnajder, Jan
%Y Steinberger, Josef
%Y Yangarber, Roman
%S Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing
%D 2017
%8 April
%I Association for Computational Linguistics
%C Valencia, Spain
%F chernyak-2017-comparison
%X In this paper we address the problem of filtering obscene lexis in Russian texts. We use string similarity measures to find words similar or identical to words from a stop list and establish both a test collection and a baseline for the task. Our experiments show that a novel string similarity measure based on the notion of an annotated suffix tree outperforms some of the other well known measures.
%R 10.18653/v1/W17-1415
%U https://aclanthology.org/W17-1415/
%U https://doi.org/10.18653/v1/W17-1415
%P 97-101

Download as File

Markdown (Informal)

[Comparison of String Similarity Measures for Obscenity Filtering](https://aclanthology.org/W17-1415/) (Chernyak, BSNLP 2017)

Comparison of String Similarity Measures for Obscenity Filtering (Chernyak, BSNLP 2017)

ACL

Ekaterina Chernyak. 2017. Comparison of String Similarity Measures for Obscenity Filtering. In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pages 97–101, Valencia, Spain. Association for Computational Linguistics.