2013
pdf
bib
Let’sMT! as a Learning Platform for SMT
Hanne Fersøe
|
Dorte Haltrup Hansen
|
Lene Offersgaard
|
Susi Olsen
|
Claus Povlsen
Proceedings of Machine Translation Summit XIV: User track
2008
pdf
bib
abs
LC-STAR II: Starring more Lexica
Ute Ziegenhain
|
Hanne Fersoe
|
Henk van den Heuvel
|
Asuncion Moreno
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
LC-STAR II is a follow-up project of the EU funded project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components, IST-2001-32216). LC-STAR II develops large lexica containing information for speech processing in ten languages targeting especially automatic speech recognition and text to speech synthesis but also other applications like speech-to-speech translation and tagging. The project follows by large the specifications developed within the scope of LC-STAR covering thirteen languages: Catalan, Finnish, German, Greek, Hebrew, Italian, Mandarin Chinese, Russian, Turkish, Slovenian, Spanish, Standard Arabic and US-English. The ten new LC-STAR II languages are: Brazilian-Portuguese, Cantonese, Czech, English-UK, French, Hindi, Polish, Portuguese, Slovak, and Urdu. The project started in 2006 with a lifetime of two years. The project is funded by a consortium, which includes Microsoft (USA), Nokia (Finland), NSC (Israel), Siemens (Germany) and Harmann/Becker (Germany). The project is coordinated by UPC (Spain) and validation is performed by SPEX (The Netherlands), and CST (Denmark). The developed language resources will be shared among partners. This paper presents a summary of the creation of word lists and lexica and an overview of adaptations of the specifications and conceptual representation model from LC-STAR to the new languages. The validation procedure will be presented too.
2006
pdf
bib
abs
Building Annotated Written and Spoken Arabic LRs in NEMLAR Project
M. Yaseen
|
M. Attia
|
B. Maegaard
|
K. Choukri
|
N. Paulsson
|
S. Haamid
|
S. Krauwer
|
C. Bendahman
|
H. Fersøe
|
M. Rashwan
|
B. Haddad
|
C. Mukbel
|
A. Mouradi
|
A. Al-Kufaishi
|
M. Shahin
|
N. Chenfour
|
A. Ragheb
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The NEMLAR project: Network for Euro-Mediterranean LAnguage Resource and human language technology development and support (www.nemlar.org) was a project supported by the EC with partners from Europe and Arabic countries, whose objective is to build a network of specialized partners to promote and support the development of Arabic Language Resources (LRs) in the Mediterranean region. The project focused on identifying the state of the art of LRs in the region, assessing priority requirements through consultations with language industry and communication players, and establishing a protocol for developing and identifying a Basic Language Resource Kit (BLARK) for Arabic, and to assess first priority requirements. The BLARK is defined as the minimal set of language resources that is necessary to do any pre-competitive research and education, in addition to the development of crucial components for any future NLP industry. Following the identification of high priority resources the NEMLAR partners agreed to focus on, and produce three main resources, which are 1) Annotated Arabic written corpus of about 500 K words, 2) Arabic speech corpus for TTS applications of 2x5 hours, and 3) Arabic broadcast news speech corpus of 40 hours Modern Standard Arabic. For each of the resources underlying linguistic models and assumptions of the corpus, technical specifications, methodologies for the collection and building of the resources, validation and verification mechanisms were put and applied for the three LRs.
2004
pdf
bib
abs
Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
Hanne Fersøe
|
Elviira Hartikainen
|
Henk van den Heuvel
|
Giulio Maltese
|
Asuncíon Moreno
|
Shaunie Shammass
|
Ute Ziegenhain
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
This paper presents specifications and requirements for creation and validation of large lexica that are needed in automatic Speech Recognition (ASR), Text-to-Speech (TTS) and statistical Speech-to-Speech Translation (SST) systems. The prepared language resources are created and validated within the scope of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) during years 2002-2005. Large lexica consisting of phonetic, suprasegmental and morpho-syntactic content will be provided with well-documented specifications for 13 languages. A short summary of the LC-STAR project itself is presented. Overview about the specification for the corpora collection and word extraction as well as the specification and format of the lexica are presented. Particular attention is paid to the validation of the produced lexica and the lessons learnt during pre-validation. The created and validated language resources will be available via ELRA/ELDA.
pdf
bib
ENABLER Thematic Network of National Projects: Technical, Strategic and Political Issues of LRs
Nicoletta Calzolari
|
Khalid Choukri
|
Maria Gavrilidou
|
Bente Maegaard
|
Paola Baroni
|
Hanne Fersøe
|
Alessandro Lenci
|
Valérie Mapelli
|
Monica Monachini
|
Stelios Piperidis
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
pdf
bib
ELRA Validation Methodology and Standard Promotion for Linguistic Resources
Hanne Fersøe
|
Monica Monachini
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
1997
pdf
bib
Why don’t they use translation tools?
Hanne Fersøe
EAMT Workshop: Language Technology in your Organization?
1990
pdf
bib
Representational Issues within Eurotra
Hanne Fersøe
Proceedings of the 7th Nordic Conference of Computational Linguistics (NODALIDA 1989)