OAL: A NLP Architecture to Improve the Development of Linguistic Resources for NLP

Javier Couto, Helena Blancafort, Somara Seng, Nicolas Kuchmann-Beauger, Anass Talby, Claude de Loupy


Abstract
The performance of most NLP applications relies upon the quality of linguistic resources. The creation, maintenance and enrichment of those resources are a labour-intensive task, especially when no tools are available. In this paper we present the NLP architecture OAL, designed to assist computational linguists in the whole process of the development of resources in an industrial context: from corpora compilation to quality assurance. To add new words more easily to the morphosyntactic lexica, a guesser that lemmatizes and assigns morphosyntactic tags as well as inflection paradigms to a new word has been developed. Moreover, different control mechanisms are set up to check the coherence and consistency of the resources. Today OAL manages resources in five European languages: French, English, Spanish, Italian and Polish. Chinese and Portuguese are in process. The development of OAL has followed an incremental strategy. At present, semantic lexica, a named entities guesser and a named entities phonetizer are being developed.
Anthology ID:
L10-1607
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/882_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Javier Couto, Helena Blancafort, Somara Seng, Nicolas Kuchmann-Beauger, Anass Talby, and Claude de Loupy. 2010. OAL: A NLP Architecture to Improve the Development of Linguistic Resources for NLP. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
OAL: A NLP Architecture to Improve the Development of Linguistic Resources for NLP (Couto et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/882_Paper.pdf