Training Language Models without Appropriate Language Resources: Experiments with an AAC System for Disabled People

Tonio Wandmacher; Jean-Yves Antoine

Training Language Models without Appropriate Language Resources: Experiments with an AAC System for Disabled People

Abstract

Statistical Language Models (LM) are highly dependent on their training resources. This makes it not only difficult to interpret evaluation results, it also has a deteriorating effect on the use of an LM-based application. This question has already been studied by others. Considering a specific domain (text prediction in a communication aid for handicapped people) we want to address the problem from a different point of view: the influence of the language register. Considering corpora from five different registers, we want to discuss three methods to adapt a language model to its actual language resource ultimately reducing the effect of training dependency: (a) A simple cache model augmenting the probability of the n last inserted words; (b) a user dictionary, keeping every unseen word; and (c) a combined LM interpolating a base model with a dynamically updated user model. Our evaluation is based on the results obtained from a text prediction system working on a trigram LM.

Anthology ID:: L06-1059
Volume:: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:: May
Year:: 2006
Address:: Genoa, Italy
Editors:: Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2006/pdf/113_pdf.pdf
DOI:
Bibkey:
Cite (ACL):: Tonio Wandmacher and Jean-Yves Antoine. 2006. Training Language Models without Appropriate Language Resources: Experiments with an AAC System for Disabled People. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):: Training Language Models without Appropriate Language Resources: Experiments with an AAC System for Disabled People (Wandmacher & Antoine, LREC 2006)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2006/pdf/113_pdf.pdf

PDF Cite Search Fix data