Pro-TEXT: an Annotated Corpus of Keystroke Logs

Aleksandra Miletić; Christophe Benzitoun; Georgeta Cislaru; Santiago Herrera-Yanez

Pro-TEXT: an Annotated Corpus of Keystroke Logs

Aleksandra Miletic, Christophe Benzitoun, Georgeta Cislaru, Santiago Herrera-Yanez

Abstract

Pro-TEXT is a corpus of keystroke logs written in French. Keystroke logs are recordings of the writing process executed through a keyboard, which keep track of all actions taken by the writer (character additions, deletions, substitutions). As such, the Pro-TEXT corpus offers new insights into text genesis and underlying cognitive processes from the production perspective. A subset of the corpus is linguistically annotated with parts of speech, lemmas and syntactic dependencies, making it suitable for the study of interactions between linguistic and behavioural aspects of the writing process. The full corpus contains 202K tokens, while the annotated portion is currently 30K tokens large. The annotated content is progressively being made available in a database-like CSV format and in CoNLL format, and the work on an HTML-based visualisation tool is currently under way. To the best of our knowledge, Pro-TEXT is the first corpus of its kind in French.

Anthology ID:: 2022.lrec-1.184
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 1732–1739
Language:
URL:: https://aclanthology.org/2022.lrec-1.184/
DOI:
Bibkey:
Cite (ACL):: Aleksandra Miletic, Christophe Benzitoun, Georgeta Cislaru, and Santiago Herrera-Yanez. 2022. Pro-TEXT: an Annotated Corpus of Keystroke Logs. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1732–1739, Marseille, France. European Language Resources Association.
Cite (Informal):: Pro-TEXT: an Annotated Corpus of Keystroke Logs (Miletic et al., LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.184.pdf

PDF Cite Search Fix data