Automatic Recognition of Linguistic Replacements in Text Series Generated from Keystroke Logs

Daniel Couto-Vale, Stella Neumann, Paula Niemietz


Abstract
This paper introduces a toolkit used for the purpose of detecting replacements of different grammatical and semantic structures in ongoing text production logged as a chronological series of computer interaction events (so-called keystroke logs). The specific case we use involves human translations where replacements can be indicative of translator behaviour that leads to specific features of translations that distinguish them from non-translated texts. The toolkit uses a novel CCG chart parser customised so as to recognise grammatical words independently of space and punctuation boundaries. On the basis of the linguistic analysis, structures in different versions of the target text are compared and classified as potential equivalents of the same source text segment by ‘equivalence judges’. In that way, replacements of grammatical and semantic structures can be detected. Beyond the specific task at hand the approach will also be useful for the analysis of other types of spaceless text such as Twitter hashtags and texts in agglutinative or spaceless languages like Finnish or Chinese.
Anthology ID:
L16-1574
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3617–3623
Language:
URL:
https://aclanthology.org/L16-1574
DOI:
Bibkey:
Cite (ACL):
Daniel Couto-Vale, Stella Neumann, and Paula Niemietz. 2016. Automatic Recognition of Linguistic Replacements in Text Series Generated from Keystroke Logs. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3617–3623, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Automatic Recognition of Linguistic Replacements in Text Series Generated from Keystroke Logs (Couto-Vale et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1574.pdf