Semi-Automated Elicitation Corpus Generation

Alison Alvarez, Lori Levin, Robert Frederking, Erik Peterson, Jeff Good


Abstract
In this document we will describe a semi-automated process for creating elicitation corpora. An elicitation corpus is translated by a bilingual consultant in order to produce high quality word aligned sentence pairs. The corpus sentences are automatically generated from detailed feature structures using the GenKit generation program. Feature structures themselves are automatically generated from information that is provided by a linguist using our corpus specification software. This helps us to build small, flexible corpora for testing and development of machine translation systems.
Anthology ID:
2005.mtsummit-posters.10
Volume:
Proceedings of Machine Translation Summit X: Posters
Month:
September 13-15
Year:
2005
Address:
Phuket, Thailand
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
388–395
Language:
URL:
https://aclanthology.org/2005.mtsummit-posters.10
DOI:
Bibkey:
Cite (ACL):
Alison Alvarez, Lori Levin, Robert Frederking, Erik Peterson, and Jeff Good. 2005. Semi-Automated Elicitation Corpus Generation. In Proceedings of Machine Translation Summit X: Posters, pages 388–395, Phuket, Thailand.
Cite (Informal):
Semi-Automated Elicitation Corpus Generation (Alvarez et al., MTSummit 2005)
Copy Citation:
PDF:
https://aclanthology.org/2005.mtsummit-posters.10.pdf