BITS: a method for bilingual text search over the Web

Xiaoyi Ma, Mark Y. Liberman


Abstract
Parallel corpus are valuable resource for machine translation, multi-lingual text retrieval, language education and other applications, but for various reasons, its availability is very limited at present. Noticed that the World Word Web is a potential source to mine parallel text, researchers are making their efforts to explore the Web in order to get a big collection of bitext. This paper presents BITS (Bilingual Internet Text Search), a system which harvests multilingual texts over the World Wide Web with virtually no human intervention. The technique is simple, easy to port to any language pairs, and with high accuracy. The results of the experiments on German-English pair proved that the method is very successful.
Anthology ID:
1999.mtsummit-1.79
Volume:
Proceedings of Machine Translation Summit VII
Month:
September 13-17
Year:
1999
Address:
Singapore, Singapore
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
538–542
Language:
URL:
https://aclanthology.org/1999.mtsummit-1.79
DOI:
Bibkey:
Cite (ACL):
Xiaoyi Ma and Mark Y. Liberman. 1999. BITS: a method for bilingual text search over the Web. In Proceedings of Machine Translation Summit VII, pages 538–542, Singapore, Singapore.
Cite (Informal):
BITS: a method for bilingual text search over the Web (Ma & Liberman, MTSummit 1999)
Copy Citation:
PDF:
https://aclanthology.org/1999.mtsummit-1.79.pdf