Classifying TEI Encoding for DutchDraCor with Transformer Models

Florian Debaene; Veronique Hoste

doi:10.18653/v1/2025.law-1.11

Classifying TEI Encoding for DutchDraCor with Transformer Models

Abstract

Computational Drama Analysis relies on well-structured textual data, yet many dramatic works remain in need of encoding. The Dutch dramatic tradition is one such an example, with currently 180 plays available in the DraCor database, while many more plays await integration still. To facilitate this process, we propose a semi-automated TEI encoding annotation methodology using transformer encoder language models to classify structural elements in Dutch drama. We fine-tune 4 Dutch models on the DutchDraCor dataset to predict the 9 most relevant labels used in the DraCor TEI encoding, experimenting with 2 model input settings. Our results show that incorporating additional context through beginning-of-sequence (BOS) and end-of-sequence (EOS) tokens greatly improves performance, increasing the average macro F1 score across models from 0.717 to 0.923 (+0.206). Using the best-performing model, we generate silver-standard DraCor labels for EmDComF, an unstructured corpus of early modern Dutch comedies and farces, paving the way for its integration into DutchDraCor after validation.

Anthology ID:: 2025.law-1.11
Volume:: Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Siyao Peng, Ines Rehbein
Venues:: LAW | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 137–141
Language:
URL:: https://aclanthology.org/2025.law-1.11/
DOI:: 10.18653/v1/2025.law-1.11
Bibkey:
Cite (ACL):: Florian Debaene and Veronique Hoste. 2025. Classifying TEI Encoding for DutchDraCor with Transformer Models. In Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025), pages 137–141, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Classifying TEI Encoding for DutchDraCor with Transformer Models (Debaene & Hoste, LAW 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.law-1.11.pdf

PDF Cite Search Fix data