Image-Image Search for Comparable Corpora Construction
Yu Hong | Liang Yao | Mengyi Liu | Tongtao Zhang | Wenxuan Zhou | Jianmin Yao | Heng Ji
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)
We present a novel method of comparable corpora construction. Unlike the traditional methods which heavily rely on linguistic features, our method only takes image similarity into consid-eration. We use an image-image search engine to obtain similar images, together with the cap-tions in source language and target language. On the basis, we utilize captions of similar imag-es to construct sentence-level bilingual corpora. Experiments on 10,371 target captions show that our method achieves a precision of 0.85 in the top search results.