The ASK Corpus - a Language Learner Corpus of Norwegian as a Second Language

Kari Tenfjord; Paul Meurer; Knut Hofland

The ASK Corpus - a Language Learner Corpus of Norwegian as a Second Language

Kari Tenfjord, Paul Meurer, Knut Hofland

Abstract

In our paper we present the design and interface of ASK, a language learner corpus of Norwegian as a second language which contains essays collected from language tests on two different proficiency levels as well as personal data from the test takers. In addition, the corpus also contains texts and relevant personal data from native Norwegians as control data. The texts as well as the personal data are marked up in XML according to the TEI Guidelines. In order to be able to classify errors in the texts, we have introduced new attributes to the TEI corr and sic tags. For each error tag, a correct form is also in the text annotation. Finally, we employ an automatic tagger developed for standard Norwegian, the Oslo-Bergen Tagger, together with a facility for manual tag correction. As corpus query system, we are using the Corpus Workbench developed at the University of Stuttgart together with a web search interface developed at Aksis, University of Bergen. The system allows for searching for combinations of words, error types, grammatical annotation and personal data.

Anthology ID:: L06-1345
Volume:: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:: May
Year:: 2006
Address:: Genoa, Italy
Editors:: Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
External URL:: http://www.lrec-conf.org/proceedings/lrec2006/pdf/573_pdf.pdf
DOI:
Bibkey:
Cite (ACL):: Kari Tenfjord, Paul Meurer, and Knut Hofland. 2006. The ASK Corpus - a Language Learner Corpus of Norwegian as a Second Language. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):: The ASK Corpus - a Language Learner Corpus of Norwegian as a Second Language (Tenfjord et al., LREC 2006)
Copy Citation:

External Cite Search Fix data