Philipp Dressen


2020

pdf bib
Swiss-AL: A Multilingual Swiss Web Corpus for Applied Linguistics
Julia Krasselt | Philipp Dressen | Matthias Fluor | Cerstin Mahlow | Klaus Rothenhäusler | Maren Runte
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Swiss Web Corpus for Applied Linguistics (Swiss-AL) is a multilingual (German, French, Italian) collection of texts from selected web sources. Unlike most other web corpora it is not intended for NLP purposes, but rather designed to support data-based and data-driven research on societal and political discourses in Switzerland. It currently contains 8 million texts (approx. 1.55 billion tokens), including news and specialist publications, governmental opinions, and parliamentary records, web sites of political parties, companies, and universities, statements from industry associations and NGOs, etc. A flexible processing pipeline using state-of-the-art components allows researchers in applied linguistics to create tailor-made subcorpora for studying discourse in a wide range of domains. So far, Swiss-AL has been used successfully in research on Swiss public discourses on energy and on antibiotic resistance.