ULex: new data models and a mobile environment for corpus enrichment.

Dafydd Gibbon


Abstract
The Ubiquitous Lexicon concept (ULex) has two sides. In the first kind of ubiquity, ULex combines prelexical corpus based lexicon extraction and formatting techniques from speech technology and corpus linguistics for both language documentation and basic speech technology (e.g. speech synthesis), and proposes new XML models for the basic datatypes concerned, in order to enable standardisastion and data interchange in these areas. The prelexical data types range from basic wordlists through diphone tables to concordance and interlinear glossing structures. While several proposals for standardising XML models of lexicon types are available, these more basic pre-lexical, data types, which are important in lexical acquisition, have received little attention. In the second area of ubiquity, ULex is implemented in a novel mobile environment to enable collaborative cross-platform use via a web application, either on the internet or, via a local hotspot, on an intranet, which runs not only on standard PC types but also on tablet computers and smartphones and is thereby also rendered truly ubiquitous in a geographical sense.
Anthology ID:
L12-1539
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3392–3398
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/905_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Dafydd Gibbon. 2012. ULex: new data models and a mobile environment for corpus enrichment.. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3392–3398, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
ULex: new data models and a mobile environment for corpus enrichment. (Gibbon, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/905_Paper.pdf