The Australian National Corpus: National Infrastructure for Language Resources

Steve Cassidy, Michael Haugh, Pam Peters, Mark Fallu


Abstract
The Australian National Corpus has been established in an effort to make currently scattered and relatively inaccessible data available to researchers through an online portal. In contrast to other national corpora, it is conceptualised as a linked collection of many existing and future language resources representing language use in Australia, unified through common technical standards. This approach allows us to bootstrap a significant collection and add value to existing resources by providing a unified, online tool-set to support research in a number of disciplines. This paper provides an outline of the technical platform being developed to support the corpus and a brief overview of some of the collections that form part of the initial version of the Australian National Corpus.
Anthology ID:
L12-1206
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3295–3299
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/400_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Steve Cassidy, Michael Haugh, Pam Peters, and Mark Fallu. 2012. The Australian National Corpus: National Infrastructure for Language Resources. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3295–3299, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
The Australian National Corpus: National Infrastructure for Language Resources (Cassidy et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/400_Paper.pdf