Anföranden: Annotated and Augmented Parliamentary Debates from Sweden

Stian Rødven Eide


Abstract
The Swedish parliamentary debates have been available since 2010 through the parliament’s open data web site Riksdagens öppna data. While fairly comprehensive, the structure of the data can be hard to understand and its content is somewhat noisy for use as a quality language resource. In order to make them easier to use and process – in particular for language technology research, but also for political science and other fields with an interest in parliamentary data – we have published a large selection of the debates in a cleaned and structured format, annotated with linguistic information and augmented with semantic links. Especially prevalent in the parliament’s data were end-line hyphenations – something that tokenisers generally are not equipped for – and a lot of the effort went into resolving these. In this paper, we provide detailed descriptions of the structure and contents of the resource, and explain how it differs from the parliament’s own version.
Anthology ID:
2020.parlaclarin-1.2
Volume:
Proceedings of the Second ParlaCLARIN Workshop
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Darja Fišer, Maria Eskevich, Franciska de Jong
Venue:
ParlaCLARIN
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5–10
Language:
English
URL:
https://aclanthology.org/2020.parlaclarin-1.2
DOI:
Bibkey:
Cite (ACL):
Stian Rødven Eide. 2020. Anföranden: Annotated and Augmented Parliamentary Debates from Sweden. In Proceedings of the Second ParlaCLARIN Workshop, pages 5–10, Marseille, France. European Language Resources Association.
Cite (Informal):
Anföranden: Annotated and Augmented Parliamentary Debates from Sweden (Rødven Eide, ParlaCLARIN 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.parlaclarin-1.2.pdf