Local Methods for On-Demand Out-of-Vocabulary Word Retrieval

Stanislas Oger, Georges Linarès, Frédéric Béchet


Abstract
Most of the Web-based methods for lexicon augmenting consist in capturing global semantic features of the targeted domain in order to collect relevant documents from the Web. We suggest that the local context of the out-of-vocabulary (OOV) words contains relevant information on the OOV words. With this information, we propose to use the Web to build locally-augmented lexicons which are used in a final local decoding pass. First, an automatic web based OOV word detection method is proposed. Then, we demonstrate the relevance of the Web for the OOV word retrieval. Different methods are proposed to retrieve the hypothesis words. We finally retrieve about 26% of the OOV words with a lexicon increase of less than 1000 words using the reference context.
Anthology ID:
L08-1044
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/193_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Stanislas Oger, Georges Linarès, and Frédéric Béchet. 2008. Local Methods for On-Demand Out-of-Vocabulary Word Retrieval. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Local Methods for On-Demand Out-of-Vocabulary Word Retrieval (Oger et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/193_paper.pdf