Paul Meurer


2018

pdf bib
The Abkhaz National Corpus
Paul Meurer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Quote Extraction and Attribution from Norwegian Newspapers
Andrew Salway | Paul Meurer | Knut Hofland | Øystein Reigem
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Exploring Treebanks with INESS Search
Victoria Rosén | Helge Dyvik | Paul Meurer | Koenraad De Smedt
Proceedings of the 21st Nordic Conference on Computational Linguistics

2016

pdf bib
NorGramBank: A ‘Deep’ Treebank for Norwegian
Helge Dyvik | Paul Meurer | Victoria Rosén | Koenraad De Smedt | Petter Haugereid | Gyri Smørdal Losnegaard | Gunn Inger Lyse | Martha Thunes
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present NorGramBank, a treebank for Norwegian with highly detailed LFG analyses. It is one of many treebanks made available through the INESS treebanking infrastructure. NorGramBank was constructed as a parsebank, i.e. by automatically parsing a corpus, using the wide coverage grammar NorGram. One part consisting of 350,000 words has been manually disambiguated using computer-generated discriminants. A larger part of 50 M words has been stochastically disambiguated. The treebank is dynamic: by global reparsing at certain intervals it is kept compatible with the latest versions of the grammar and the lexicon, which are continually further developed in interaction with the annotators. A powerful query language, INESS Search, has been developed for search across formalisms in the INESS treebanks, including LFG c- and f-structures. Evaluation shows that the grammar provides about 85% of randomly selected sentences with good analyses. Agreement among the annotators responsible for manual disambiguation is satisfactory, but also suggests desirable simplifications of the grammar.

2013

pdf bib
ParGramBank: The ParGram Parallel Treebank
Sebastian Sulger | Miriam Butt | Tracy Holloway King | Paul Meurer | Tibor Laczkó | György Rákosi | Cheikh Bamba Dione | Helge Dyvik | Victoria Rosén | Koenraad De Smedt | Agnieszka Patejuk | Özlem Çetinoğlu | I Wayan Arka | Meladel Mistica
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
The INESS Treebanking Infrastructure
Paul Meurer | Helge Dyvik | Victoria Rosén | Koenraad De Smedt | Gunn Inger Lyse | Gyri Smørdal Losnegaard | Martha Thunes
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2008

pdf bib
Speeding up LFG Parsing Using C-Structure Pruning
Aoife Cahill | John T. Maxwell III | Paul Meurer | Christian Rohrer | Victoria Rosén
Coling 2008: Proceedings of the workshop on Grammar Engineering Across Frameworks

2007

pdf bib
Towards hybrid quality-oriented machine translation – on linguistics and probabilities in MT
Stephan Oepen | Erik Velldal | Jan Tore Lønning | Paul Meurer | Victoria Rosén | Dan Flickinger
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2006

pdf bib
The ASK Corpus - a Language Learner Corpus of Norwegian as a Second Language
Kari Tenfjord | Paul Meurer | Knut Hofland
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In our paper we present the design and interface of ASK, a language learner corpus of Norwegian as a second language which contains essays collected from language tests on two different proficiency levels as well as personal data from the test takers. In addition, the corpus also contains texts and relevant personal data from native Norwegians as control data. The texts as well as the personal data are marked up in XML according to the TEI Guidelines. In order to be able to classify “errors” in the texts, we have introduced new attributes to the TEI corr and sic tags. For each error tag, a correct form is also in the text annotation. Finally, we employ an automatic tagger developed for standard Norwegian, the “Oslo-Bergen Tagger”, together with a facility for manual tag correction. As corpus query system, we are using the Corpus Workbench developed at the University of Stuttgart together with a web search interface developed at Aksis, University of Bergen. The system allows for searching for combinations of words, error types, grammatical annotation and personal data.

2005

pdf bib
Holistic regression testing for high-quality MT: some methodological and technological reflections
Stephan Oepen | Helge Dyvik | Dan Flickinger | Jan Tore Lønning | Paul Meurer | Victoria Rosén
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

pdf bib
Som å kapp-ete med trollet? – Towards MRS-based Norwegian-English machine translation
Stephan Oepen | Helge Dyvik | Jan Tore Lønning | Erik Velldal | Dorothee Beerman | John Carroll | Dan Flickinger | Lars Hellan | Janne Bondi Johannessen | Paul Meurer | Torbjørn Nordgård | Victoria Rosén
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages