2016
pdf
bib
abs
Providing a Catalogue of Language Resources for Commercial Users
Bente Maegaard
|
Lina Henriksen
|
Andrew Joscelyne
|
Vesna Lusicky
|
Margaretha Mazura
|
Sussi Olsen
|
Claus Povlsen
|
Philippe Wacker
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Language resources (LR) are indispensable for the development of tools for machine translation (MT) or various kinds of computer-assisted translation (CAT). In particular language corpora, both parallel and monolingual are considered most important for instance for MT, not only SMT but also hybrid MT. The Language Technology Observatory will provide easy access to information about LRs deemed to be useful for MT and other translation tools through its LR Catalogue. In order to determine what aspects of an LR are useful for MT practitioners, a user study was made, providing a guide to the most relevant metadata and the most relevant quality criteria. We have seen that many resources exist which are useful for MT and similar work, but the majority are for (academic) research or educational use only, and as such not available for commercial use. Our work has revealed a list of gaps: coverage gap, awareness gap, quality gap, quantity gap. The paper ends with recommendations for a forward-looking strategy.
2014
pdf
bib
abs
Encompassing a spectrum of LT users in the CLARIN-DK Infrastructure
Lina Henriksen
|
Dorte Haltrup Hansen
|
Bente Maegaard
|
Bolette Sandford Pedersen
|
Claus Povlsen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
CLARIN-DK is a platform with language resources constituting the Danish part of the European infrastructure CLARIN ERIC. Unlike some other language based infrastructures CLARIN-DK is not solely a repository for upload and storage of data, but also a platform of web services permitting the user to process data in various ways. This involves considerable complications in relation to workflow requirements. The CLARIN-DK interface must guide the user to perform the necessary steps of a workflow; even when the user is inexperienced and perhaps has an unclear conception of the requested results. This paper describes a user driven approach to creating a user interface specification for CLARIN-DK. We indicate how different user profiles determined different crucial interface design options. We also describe some use cases established in order to give illustrative examples of how the platform may facilitate research.
2012
pdf
bib
abs
Creation and use of Language Resources in a Question-Answering eHealth System
Ulrich Andersen
|
Anna Braasch
|
Lina Henriksen
|
Csaba Huszka
|
Anders Johannsen
|
Lars Kayser
|
Bente Maegaard
|
Ole Norgaard
|
Stefan Schulz
|
Jürgen Wedekind
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
ESICT (Experience-oriented Sharing of health knowledge via Information and Communication Technology) is an ongoing research project funded by the Danish Council for Strategic Research. It aims at developing a health/disease related information system based on information technology, language technology, and formalized medical knowledge. The formalized medical knowledge consists partly of the terminology database SNOMED CT and partly of authorized medical texts on the domain. The system will allow users to ask questions in Danish and will provide natural language answers. Currently, the project is pursuing three basically different methods for question answering, and they are all described to some extent in this paper. A system prototype will handle questions related to diabetes and heart diseases. This paper concentrates on the methods employed for question answering and the language resources that are utilized. Some resources were existing, such as SNOMED CT, others, such as a corpus of sample questions, have had to be created or constructed.
2008
pdf
bib
abs
Merging a Syntactic Resource with a WordNet: a Feasibility Study of a Merge between STO and DanNet
Bolette Sandford Pedersen
|
Anna Braasch
|
Lina Henriksen
|
Sussi Olsen
|
Claus Povlsen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper presents a feasibility study of a merge between SprogTeknologisk Ordbase (STO), which contains morphological and syntactic information, and DanNet, which is a Danish WordNet containing semantic information in terms of synonym sets and semantic relations. The aim of the merge is to develop a richer, composite resource which we believe will have a broader usage perspective than the two seen in isolation. In STO, the organizing principle is based on the observable syntactic features of a lemmas near context (labeled syntactic units or SynUs). In contrast, the basic unit in DanNet is constituted by semantic senses or - in wordnet terminology - synonym sets (synsets). The merge of the two resources is thus basically to be understood as a linking between SynUs and synsets. In the paper we discuss which parts of the merge can be performed semi-automatically and which parts require manual linguistic matching procedures. We estimate that this manual work will amount to approx. 39% of the lexicon material.
2006
pdf
bib
abs
EuroTermBank - a Terminology Resource based on Best Practice
Lina Henriksen
|
Claus Povlsen
|
Andrejs Vasiljevs
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The new EU member countries face the problems of terminology resource fragmentation and lack of coordination in terminology development in general. The EuroTermBank project aims at contributing to improve the terminology infrastructure of the new EU countries and the project will result in a centralized online terminology bank - interlinked to other terminology banks and resources - for languages of the new EU member countries. The main focus of this paper is on a description of how to identify best practice within terminology work seen from a broad perspective. Surveys of real life terminology work have been conducted and these surveys have resulted in identification of scenario specific best practice descriptions of terminology work. Furthermore, this paper will present an outline of the specific criteria that have been used for selection of existing term resources to be included in the EuroTermBank database.
pdf
bib
abs
The MULINCO corpus and corpus platform
Bente Maegaard
|
Lene Offersgaard
|
Lina Henriksen
|
Hanne Jansen
|
Xavier Lepetit
|
Costanza Navarretta
|
Claus Povlsen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The MULINCO project (MUltiLINgual Corpus of the University of Copenhagen) started early 2005. The purpose of this cross-disciplinary project is to create a corpus platform for education and research in monolingual and translation studies. The project covers two main types of corpus texts: literary and non-literary. The platform is being developed using available tools as far as possible, and integrating them in a very open architecture. In this paper we describe the current status and future developments of both the text and tool side of the corpus platform, and we show some examples of student exercises taking advantage of tagged and aligned texts.
2004
pdf
bib
Corporate Voice, Tone of Voice and Controlled Language Techniques
Lina Henriksen
|
Bart Jongejan
|
Bente Maegaard
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)