Włodzimierz Gruszczyński
2014
Digital Library 2.0: Source of Knowledge and Research Collaboration Platform
Włodzimierz Gruszczyński
|
Maciej Ogrodniczuk
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Digital libraries are frequently treated just as a new method of storage of digitized artifacts, with all consequences of transferring long-established ways of dealing with physical objects into the digital world. Such attitude improves availability, but often neglects other opportunities offered by global and immediate access, virtuality and linking ― as easy as never before. The article presents the idea of transforming a conventional digital library into knowledge source and research collaboration platform, facilitating content augmentation, interpretation and co-operation of geographically distributed researchers representing different academic fields. This concept has been verified by the process of extending descriptions stored in thematic Digital Library of Polish and Poland-related Ephemeral Prints from the 16th, 17th and 18th Centuries with extended item-associated information provided by historians, philologists, librarians and computer scientists. It resulted in associating the customary fixed metadata and digitized content with historical comments, mini-dictionaries of foreign interjections or explanation of less-known background details.
Measuring Readability of Polish Texts: Baseline Experiments
Bartosz Broda
|
Bartłomiej Nitoń
|
Włodzimierz Gruszczyński
|
Maciej Ogrodniczuk
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Measuring readability of a text is the first sensible step to its simplification. In this paper we present an overview of the most common approaches to automatic measuring of readability. Of the described ones, we implemented and evaluated: Gunning FOG index, Flesch-based Pisarek method. We also present two other approaches. The first one is based on measuring distributional lexical similarity of a target text and comparing it to reference texts. In the second one, we propose a novel method for automation of Taylor test ― which, in its base form, requires performing a large amount of surveys. The automation of Taylor test is performed using a technique called statistical language modelling. We have developed a free on-line web-based system and constructed plugins for the most common text editors, namely Microsoft Word and OpenOffice.org. Inner workings of the system are described in detail. Finally, extensive evaluations are performed for Polish ― a Slavic, highly inflected language. We show that Pisareks method is highly correlated to Gunning FOG Index, even if different in form, and that both the similarity-based approach and automated Taylor test achieve high accuracy. Merits of using either of them are discussed.