Relevance of ASR for the Automatic Generation of Keywords Suggestions for TV programs
Véronique Malaisé | Luit Gazendam | Willemijn Heeren | Roeland Ordelman | Hennie Brugman
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Semantic access to multimedia content in audiovisual archives is to a large extent dependent on quantity and quality of the metadata, and particularly the content descriptions that are attached to the individual items. However, the manual annotation of collections puts heavy demands on resources. A large number of archives are introducing (semi) automatic annotation techniques for generating and/or enhancing metadata. The NWO funded CATCH-CHOICE project has investigated the extraction of keywords from textual resources related to TV programs to be archived (context documents), in collaboration with the Dutch audiovisual archives, Sound and Vision. This paper investigates the suitability of Automatic Speech Recognition transcripts produced in the CATCH-CHoral project for generating such keywords, which we evaluate against manual annotations of the documents, and against keywords automatically generated from context documents describing the TV programs’ content.


Disambiguating automatic semantic annotation based on a thesaurus structure
Véronique Malaisé | Luit Gazendam | Hennie Brugman
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

The use/use for relationship a thesaurus is usually more complex than the (para-) synonymy recommended in the ISO-2788 standard describing the content of these controlled vocabularies. The fact that a non preferred term can refer to multiple preferred terms (only the latter are relevant in controlled indexing) makes this relationship difficult to use in automatic annotation applications : it generates ambiguity cases. In this paper, we present the CARROT algorithm, meant to rank the output of our Information Extraction pipeline, and how this algorithm can be used to select the relevant preferred term out of different possibilities. This selection is meant to provide suggestions of keywords to human annotators, in order to ease and speed up their daily process and is based on the structure of their thesaurus. We achieve a 95 % success, and discuss these results along with perspectives for this experiment.

Anchoring Dutch Cultural Heritage Thesauri to WordNet: Two Case Studies
Véronique Malaisé | Antoine Isaac | Luit Gazendam | Hennie Brugman
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).


A Web Based General Thesaurus Browser to Support Indexing of Television and Radio Programs
Hennie Brugman | Véronique Malaisé | Luit Gazendam
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Documentation and retrieval processes at the Netherlands Institute for Sound and Vision are organized around a common thesaurus. To help improve the quality of these processes the thesaurus was transformed into a RDF/OWL ontology and extended on basis of implicit information and external resources. A thesaurus browser web application was designed, implemented and tested on future users.