Enrichment of Bilingual Dictionary through News Stream Data

Ajay Dubey; Parth Gupta; Vasudeva Varma; Paolo Rosso

Enrichment of Bilingual Dictionary through News Stream Data

Ajay Dubey, Parth Gupta, Vasudeva Varma, Paolo Rosso

Abstract

Bilingual dictionaries are the key component of the cross-lingual similarity estimation methods. Usually such dictionary generation is accomplished by manual or automatic means. Automatic generation approaches include to exploit parallel or comparable data to derive dictionary entries. Such approaches require large amount of bilingual data in order to produce good quality dictionary. Many time the language pair does not have large bilingual comparable corpora and in such cases the best automatic dictionary is upper bounded by the quality and coverage of such corpora. In this work we propose a method which exploits continuous quasi-comparable corpora to derive term level associations for enrichment of such limited dictionary. Though we propose our experiments for English and Hindi, our approach can be easily extendable to other languages. We evaluated dictionary by manually computing the precision. In experiments we show our approach is able to derive interesting term level associations across languages.

Anthology ID:: L14-1080
Volume:: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:: May
Year:: 2014
Address:: Reykjavik, Iceland
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 3761–3765
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/1105_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Ajay Dubey, Parth Gupta, Vasudeva Varma, and Paolo Rosso. 2014. Enrichment of Bilingual Dictionary through News Stream Data. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3761–3765, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):: Enrichment of Bilingual Dictionary through News Stream Data (Dubey et al., LREC 2014)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/1105_Paper.pdf

PDF Cite Search Fix data