Baden Hughes


2006

pdf bib
Searching for Language Resources on the Web: User Behaviour in the Open Language Archives Community
Baden Hughes
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

While much effort is expended in the curation of language resources, such investment is largely irrelevant if users cannot locate resourcesof interest. The Open Language Archives Community (OLAC) was established to define standards for the description of language resources and providecore infrastructure for a virtual digital library, thus addressing the resource discovery issue. In this paper we consider naturalistic user search behaviour in the Open Language Archives Community. Specifically, we have collected the query logs from the OLAC Search Engine over a 2 year period, collecting in excess of 1.2 million queries, in over 450K user search sessions. Subsequently we have mined these to discover user search patterns of various types, all pertaining to the discovery of language resources.A number of interesting observations can be made based on this analysis, in this paper we report on a range of properties and behaviours based on empirical evidence.

pdf bib
Reconsidering Language Identification for Written Language Resources
Baden Hughes | Timothy Baldwin | Steven Bird | Jeremy Nicholson | Andrew MacKinlay
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The task of identifying the language in which a given document (ranging from a sentence to thousands of pages) is written has been relatively well studied over several decades. Automated approachesto written language identification are used widely throughout research and industrial contexts, over both oral and written source materials. Despite this widespread acceptance, a review of previous research in written language identification reveals a number of questions which remain openand ripe for further investigation.

pdf bib
Feature-based Encoding and Querying Language Resources with Character Semantics
Baden Hughes | Dafydd Gibbon | Thorsten Trippel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we discuss the explicit representation of character features pertaining to written language resources, which we argue are critically necessary in the long term of archiving language data. Much focus on the creation of language resources and their associated preservation is at the level of the corpus itself; however it is generally accepted that long term interpretation of these language resources requires more than a best practice data format. In particular, where language resources are created in linguistic fieldwork, and especially for minority languages, the need for preservation not only of the resource itself, but of additional metadata which allows for the resource to be accurately interpreted in the future is becoming a topic of research in itself. In this paper we extend earlier work on semantically based character decomposition to include representation of character properties in a variety of models, and a mechanism for exploiting these properties through queries.

pdf bib
Frontiers in Linguistic Annotation for Lower-Density Languages
Mike Maxwell | Baden Hughes
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006

2005

pdf bib
A Distributed Architecture for Interactive Parse Annotation
Baden Hughes | James Haggerty | Joel Nothman | Saritha Manickam | James R. Curran
Proceedings of the Australasian Language Technology Workshop 2005

2004

pdf bib
Securing Interpretability: The Case of Ega Language Documentation
Dafydd Gibbon | Catherine Bow | Steven Bird | Baden Hughes
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Functional Requirements for an Interlinear Text Editor
Baden Hughes | Catherine Bow | Steven Bird
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project
Baden Hughes | David Penton | Steven Bird | Catherine Bow | Gillian Wigglesworth | Patrick McConvell | Jane Simpson
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Proceedings of the Australasian Language Technology Workshop 2003
Catherine Bow | Baden Hughes
Proceedings of the Australasian Language Technology Workshop 2003

pdf bib
Encoding and presenting interlinear text using XML technologies
Baden Hughes | Steven Bird | Catherine Bow
Proceedings of the Australasian Language Technology Workshop 2003

pdf bib
Grid-Enabling Natural Language Engineering By Stealth
Baden Hughes | Steven Bird
Proceedings of the HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS)