Automatically Generated Online Dictionaries

Enikő Héja, Dávid Takács


Abstract
The aim of our software presentation is to demonstrate that corpus-driven bilingual dictionaries generated fully by automatic means are suitable for human use. Need for such dictionaries shows up specifically in the case of lesser used languages where due to the low demand it does not pay off for publishers to invest into the production of dictionaries. Previous experiments have proven that bilingual lexicons can be created by applying word alignment on parallel corpora. Such an approach, especially the corpus-driven nature of it, yields several advantages over more traditional approaches. Most importantly, automatically attained translation probabilities are able to guarantee that the most frequently used translations come first within an entry. However, the proposed technique have to face some difficulties, as well. In particular, the scarce availability of parallel texts for medium density languages imposes limitations on the size of the resulting dictionary. Our objective is to design and implement a dictionary building workflow and a query system that is apt to exploit the additional benefits of the method and overcome the disadvantages of it.
Anthology ID:
L12-1346
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2487–2493
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/606_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Enikő Héja and Dávid Takács. 2012. Automatically Generated Online Dictionaries. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2487–2493, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Automatically Generated Online Dictionaries (Héja & Takács, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/606_Paper.pdf