Corpus of 19th-century Czech Texts: Problems and Solutions

Karel Kučera, Martin Stluka


Abstract
Although the Czech language of the 19th century represents the roots of modern Czech and many features of the 20th- and 21st-century language cannot be properly understood without this historical background, the 19th-century Czech has not been thoroughly and consistently researched so far. The long-term project of a corpus of 19th-century Czech printed texts, currently in its third year, is intended to stimulate the research as well as to provide a firm material basis for it. The reason why, in our opinion, the project is worth mentioning is that it is faced with an unusual concentration of problems following mostly from the fact that the 19th century was arguably the most tumultuous period in the history of Czech, as well as from the fact that Czech is a highly inflectional language with a long history of sound changes, orthography reforms and rather discontinuous development of its vocabulary. The paper will briefly characterize the general background of the problems and present the reasoning behind the solutions that have been implemented in the ongoing project.
Anthology ID:
L14-1271
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
165–168
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/300_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Karel Kučera and Martin Stluka. 2014. Corpus of 19th-century Czech Texts: Problems and Solutions. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 165–168, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Corpus of 19th-century Czech Texts: Problems and Solutions (Kučera & Stluka, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/300_Paper.pdf