Joel Wallenberg

Also published as: Joel C. Wallenberg


2014

pdf bib
Rapid Deployment of Phrase Structure Parsing for Related Languages: A Case Study of Insular Scandinavian
Anton Karl Ingason | Hrafn Loftsson | Eiríkur Rögnvaldsson | Einar Freyr Sigurðsson | Joel C. Wallenberg
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents ongoing work that aims to improve machine parsing of Faroese using a combination of Faroese and Icelandic training data. We show that even if we only have a relatively small parsed corpus of one language, namely 53,000 words of Faroese, we can obtain better results by adding information about phrase structure from a closely related language which has a similar syntax. Our experiment uses the Berkeley parser. We demonstrate that the addition of Icelandic data without any other modification to the experimental setup results in an f-measure improvement from 75.44% to 78.05% in Faroese and an improvement in part-of-speech tagging accuracy from 88.86% to 90.40%.

2012

pdf bib
The Icelandic Parsed Historical Corpus (IcePaHC)
Eiríkur Rögnvaldsson | Anton Karl Ingason | Einar Freyr Sigurðsson | Joel Wallenberg
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe the background for and building of IcePaHC, a one million word parsed historical corpus of Icelandic which has just been finished. This corpus which is completely free and open contains fragments of 60 texts ranging from the late 12th century to the present. We describe the text selection and text collecting process and discuss the quality of the texts and their conversion to modern Icelandic spelling. We explain why we choose to use a phrase structure Penn style annotation scheme and briefly describe the syntactic anno-tation process. We also describe a spin-off project which is only in its beginning stages: a parsed historical corpus of Faroese. Finally, we advocate the importance of an open source policy as regards language resources.

2008

pdf bib
Icelandic Data Driven Part of Speech Tagging
Mark Dredze | Joel Wallenberg
Proceedings of ACL-08: HLT, Short Papers