Czech National Corpus in 2020: Recent Developments and Future Outlook

Michal Křen

Czech National Corpus in 2020: Recent Developments and Future Outlook

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

The paper overviews the state of implementation of the Czech National Corpus (CNC) in all the main areas of its operation: corpus compilation, annotation, application development and user services. As the focus is on the recent development, some of the areas are described in more detail than the others. Close attention is paid to the data collection and, in particular, to the description of web application development. This is not only because CNC has recently seen a significant progress in this area, but also because we believe that end-user web applications shape the way linguists and other scholars think about the language data and about the range of possibilities they offer. This consideration is even more important given the variability of the CNC corpora.

Anthology ID:: 2020.cmlc-1.8
Volume:: Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Piotr Bański, Adrien Barbaresi, Simon Clematide, Marc Kupietz, Harald Lüngen, Ines Pisetta
Venue:: CMLC
SIG:
Publisher:: European Language Ressources Association
Note:
Pages:: 52–57
Language:: English
URL:: https://aclanthology.org/2020.cmlc-1.8/
DOI:
Bibkey:
Cite (ACL):: Michal Kren. 2020. Czech National Corpus in 2020: Recent Developments and Future Outlook. In Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora, pages 52–57, Marseille, France. European Language Ressources Association.
Cite (Informal):: Czech National Corpus in 2020: Recent Developments and Future Outlook (Kren, CMLC 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.cmlc-1.8.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{kren-2020-czech,
    title = "{C}zech National Corpus in 2020: Recent Developments and Future Outlook",
    author = "Kren, Michal",
    editor = {Ba{\'n}ski, Piotr  and
      Barbaresi, Adrien  and
      Clematide, Simon  and
      Kupietz, Marc  and
      L{\"u}ngen, Harald  and
      Pisetta, Ines},
    booktitle = "Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Ressources Association",
    url = "https://aclanthology.org/2020.cmlc-1.8/",
    pages = "52--57",
    language = "eng",
    ISBN = "979-10-95546-61-0",
    abstract = "The paper overviews the state of implementation of the Czech National Corpus (CNC) in all the main areas of its operation: corpus compilation, annotation, application development and user services. As the focus is on the recent development, some of the areas are described in more detail than the others. Close attention is paid to the data collection and, in particular, to the description of web application development. This is not only because CNC has recently seen a significant progress in this area, but also because we believe that end-user web applications shape the way linguists and other scholars think about the language data and about the range of possibilities they offer. This consideration is even more important given the variability of the CNC corpora."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="kren-2020-czech">
    <titleInfo>
        <title>Czech National Corpus in 2020: Recent Developments and Future Outlook</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Michal</namePart>
        <namePart type="family">Kren</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-05</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <language>
        <languageTerm type="text">eng</languageTerm>
    </language>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Piotr</namePart>
            <namePart type="family">Bański</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Adrien</namePart>
            <namePart type="family">Barbaresi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Simon</namePart>
            <namePart type="family">Clematide</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marc</namePart>
            <namePart type="family">Kupietz</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Harald</namePart>
            <namePart type="family">Lüngen</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ines</namePart>
            <namePart type="family">Pisetta</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>European Language Ressources Association</publisher>
            <place>
                <placeTerm type="text">Marseille, France</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">979-10-95546-61-0</identifier>
    </relatedItem>
    <abstract>The paper overviews the state of implementation of the Czech National Corpus (CNC) in all the main areas of its operation: corpus compilation, annotation, application development and user services. As the focus is on the recent development, some of the areas are described in more detail than the others. Close attention is paid to the data collection and, in particular, to the description of web application development. This is not only because CNC has recently seen a significant progress in this area, but also because we believe that end-user web applications shape the way linguists and other scholars think about the language data and about the range of possibilities they offer. This consideration is even more important given the variability of the CNC corpora.</abstract>
    <identifier type="citekey">kren-2020-czech</identifier>
    <location>
        <url>https://aclanthology.org/2020.cmlc-1.8/</url>
    </location>
    <part>
        <date>2020-05</date>
        <extent unit="page">
            <start>52</start>
            <end>57</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Czech National Corpus in 2020: Recent Developments and Future Outlook
%A Kren, Michal
%Y Bański, Piotr
%Y Barbaresi, Adrien
%Y Clematide, Simon
%Y Kupietz, Marc
%Y Lüngen, Harald
%Y Pisetta, Ines
%S Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora
%D 2020
%8 May
%I European Language Ressources Association
%C Marseille, France
%@ 979-10-95546-61-0
%G eng
%F kren-2020-czech
%X The paper overviews the state of implementation of the Czech National Corpus (CNC) in all the main areas of its operation: corpus compilation, annotation, application development and user services. As the focus is on the recent development, some of the areas are described in more detail than the others. Close attention is paid to the data collection and, in particular, to the description of web application development. This is not only because CNC has recently seen a significant progress in this area, but also because we believe that end-user web applications shape the way linguists and other scholars think about the language data and about the range of possibilities they offer. This consideration is even more important given the variability of the CNC corpora.
%U https://aclanthology.org/2020.cmlc-1.8/
%P 52-57

Download as File

Markdown (Informal)

[Czech National Corpus in 2020: Recent Developments and Future Outlook](https://aclanthology.org/2020.cmlc-1.8/) (Kren, CMLC 2020)

Czech National Corpus in 2020: Recent Developments and Future Outlook (Kren, CMLC 2020)

ACL

Michal Kren. 2020. Czech National Corpus in 2020: Recent Developments and Future Outlook. In Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora, pages 52–57, Marseille, France. European Language Ressources Association.