Victoria Rosén


2017

pdf bib
Exploring Treebanks with INESS Search
Victoria Rosén | Helge Dyvik | Paul Meurer | Koenraad De Smedt
Proceedings of the 21st Nordic Conference on Computational Linguistics

2016

pdf bib
MWEs in Treebanks: From Survey to Guidelines
Victoria Rosén | Koenraad De Smedt | Gyri Smørdal Losnegaard | Eduard Bejček | Agata Savary | Petya Osenova
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

By means of an online survey, we have investigated ways in which various types of multiword expressions are annotated in existing treebanks. The results indicate that there is considerable variation in treatments across treebanks and thereby also, to some extent, across languages and across theoretical frameworks. The comparison is focused on the annotation of light verb constructions and verbal idioms. The survey shows that the light verb constructions either get special annotations as such, or are treated as ordinary verbs, while VP idioms are handled through different strategies. Based on insights from our investigation, we propose some general guidelines for annotating multiword expressions in treebanks. The recommendations address the following application-based needs: distinguishing MWEs from similar but compositional constructions; searching distinct types of MWEs in treebanks; awareness of literal and nonliteral meanings; and normalization of the MWE representation. The cross-lingually and cross-theoretically focused survey is intended as an aid to accessing treebanks and an aid for further work on treebank annotation.

pdf bib
NorGramBank: A ‘Deep’ Treebank for Norwegian
Helge Dyvik | Paul Meurer | Victoria Rosén | Koenraad De Smedt | Petter Haugereid | Gyri Smørdal Losnegaard | Gunn Inger Lyse | Martha Thunes
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present NorGramBank, a treebank for Norwegian with highly detailed LFG analyses. It is one of many treebanks made available through the INESS treebanking infrastructure. NorGramBank was constructed as a parsebank, i.e. by automatically parsing a corpus, using the wide coverage grammar NorGram. One part consisting of 350,000 words has been manually disambiguated using computer-generated discriminants. A larger part of 50 M words has been stochastically disambiguated. The treebank is dynamic: by global reparsing at certain intervals it is kept compatible with the latest versions of the grammar and the lexicon, which are continually further developed in interaction with the annotators. A powerful query language, INESS Search, has been developed for search across formalisms in the INESS treebanks, including LFG c- and f-structures. Evaluation shows that the grammar provides about 85% of randomly selected sentences with good analyses. Agreement among the annotators responsible for manual disambiguation is satisfactory, but also suggests desirable simplifications of the grammar.

2014

pdf bib
The Interplay Between Lexical and Syntactic Resources in Incremental Parsebanking
Victoria Rosén | Petter Haugereid | Martha Thunes | Gyri S. Losnegaard | Helge Dyvik
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Automatic syntactic analysis of a corpus requires detailed lexical and morphological information that cannot always be harvested from traditional dictionaries. In building the INESS Norwegian treebank, it is often the case that necessary lexical information is missing in the morphology or lexicon. The approach used to build the treebank is incremental parsebanking; a corpus is parsed with an existing grammar, and the analyses are efficiently disambiguated by annotators. When the intended analysis is unavailable after parsing, the reason is often that necessary information is not available in the lexicon. INESS has therefore implemented a text preprocessing interface where annotators can enter unrecognized words before parsing. This may concern words that are unknown to the morphology and/or lexicon, and also words that are known, but for which important information is missing. When this information is added, either during text preprocessing or during disambiguation, the result is that after reparsing the intended analysis can be chosen and stored in the treebank. The lexical information added to the lexicon in this way may be of great interest both to lexicographers and to other language technology efforts, and the enriched lexical resource being developed will be made available at the end of the project.

2013

pdf bib
ParGramBank: The ParGram Parallel Treebank
Sebastian Sulger | Miriam Butt | Tracy Holloway King | Paul Meurer | Tibor Laczkó | György Rákosi | Cheikh Bamba Dione | Helge Dyvik | Victoria Rosén | Koenraad De Smedt | Agnieszka Patejuk | Özlem Çetinoğlu | I Wayan Arka | Meladel Mistica
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
The INESS Treebanking Infrastructure
Paul Meurer | Helge Dyvik | Victoria Rosén | Koenraad De Smedt | Gunn Inger Lyse | Gyri Smørdal Losnegaard | Martha Thunes
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2008

pdf bib
Speeding up LFG Parsing Using C-Structure Pruning
Aoife Cahill | John T. Maxwell III | Paul Meurer | Christian Rohrer | Victoria Rosén
Coling 2008: Proceedings of the workshop on Grammar Engineering Across Frameworks

2007

pdf bib
Theoretically Motivated Treebank Coverage
Victoria Rosén | Koenraad de Smedt
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

pdf bib
Towards hybrid quality-oriented machine translation – on linguistics and probabilities in MT
Stephan Oepen | Erik Velldal | Jan Tore Lønning | Paul Meurer | Victoria Rosén | Dan Flickinger
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2005

pdf bib
Holistic regression testing for high-quality MT: some methodological and technological reflections
Stephan Oepen | Helge Dyvik | Dan Flickinger | Jan Tore Lønning | Paul Meurer | Victoria Rosén
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

pdf bib
Som å kapp-ete med trollet? – Towards MRS-based Norwegian-English machine translation
Stephan Oepen | Helge Dyvik | Jan Tore Lønning | Erik Velldal | Dorothee Beerman | John Carroll | Dan Flickinger | Lars Hellan | Janne Bondi Johannessen | Paul Meurer | Torbjørn Nordgård | Victoria Rosén
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

2000

pdf bib
Automatic proofreading for Norwegian: The challenges of lexical and grammatical variation
Koenraad de Smedt | Victoria Rosén
Proceedings of the 12th Nordic Conference of Computational Linguistics (NODALIDA 1999)