The ASK Corpus - a Language Learner Corpus of Norwegian as a Second Language

Kari Tenfjord, Paul Meurer, Knut Hofland


Abstract
In our paper we present the design and interface of ASK, a language learner corpus of Norwegian as a second language which contains essays collected from language tests on two different proficiency levels as well as personal data from the test takers. In addition, the corpus also contains texts and relevant personal data from native Norwegians as control data. The texts as well as the personal data are marked up in XML according to the TEI Guidelines. In order to be able to classify “errors” in the texts, we have introduced new attributes to the TEI corr and sic tags. For each error tag, a correct form is also in the text annotation. Finally, we employ an automatic tagger developed for standard Norwegian, the “Oslo-Bergen Tagger”, together with a facility for manual tag correction. As corpus query system, we are using the Corpus Workbench developed at the University of Stuttgart together with a web search interface developed at Aksis, University of Bergen. The system allows for searching for combinations of words, error types, grammatical annotation and personal data.
Anthology ID:
L06-1345
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/573_pdf.pdf
DOI:
Bibkey:
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/573_pdf.pdf