Exploring the AFPAK Web

Rod Holland


Abstract
In spite of low literacy levels in Afghanistan and the Tribal Areas of Pakistan, the Pashto and Dari regions of the World Wide Web manifest diverse content from authors with a broad range of viewpoints. We have used cross-language information retrieval (CLIR) with machine translation to explore this content, and present an informal study of the principal genres that we have encountered. The suitability and limitations of existing machine translation packages for these languages for the exploitation of this content is discussed.
Anthology ID:
2010.amta-government.10
Volume:
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program
Month:
October 31-November 4
Year:
2010
Address:
Denver, Colorado, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2010.amta-government.10
DOI:
Bibkey:
Copy Citation: