Swiss-AL: A Multilingual Swiss Web Corpus for Applied Linguistics

Julia Krasselt, Philipp Dressen, Matthias Fluor, Cerstin Mahlow, Klaus Rothenhäusler, Maren Runte


Abstract
The Swiss Web Corpus for Applied Linguistics (Swiss-AL) is a multilingual (German, French, Italian) collection of texts from selected web sources. Unlike most other web corpora it is not intended for NLP purposes, but rather designed to support data-based and data-driven research on societal and political discourses in Switzerland. It currently contains 8 million texts (approx. 1.55 billion tokens), including news and specialist publications, governmental opinions, and parliamentary records, web sites of political parties, companies, and universities, statements from industry associations and NGOs, etc. A flexible processing pipeline using state-of-the-art components allows researchers in applied linguistics to create tailor-made subcorpora for studying discourse in a wide range of domains. So far, Swiss-AL has been used successfully in research on Swiss public discourses on energy and on antibiotic resistance.
Anthology ID:
2020.lrec-1.510
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4145–4151
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.510
DOI:
Bibkey:
Cite (ACL):
Julia Krasselt, Philipp Dressen, Matthias Fluor, Cerstin Mahlow, Klaus Rothenhäusler, and Maren Runte. 2020. Swiss-AL: A Multilingual Swiss Web Corpus for Applied Linguistics. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 4145–4151, Marseille, France. European Language Resources Association.
Cite (Informal):
Swiss-AL: A Multilingual Swiss Web Corpus for Applied Linguistics (Krasselt et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.510.pdf