WeBiText: Multilingual Concordancer Built from Public High Quality Web Content

Alain Désilets


Abstract
In this paper, we describe WeBiText (www.webitext.ca) and how it is being used. WeBiText is a concordancer that allows translators to search in large, high-quality multilingual web sites, in order to find solutions to translation problems. After a quick overview of the system, we present results from an analysis of its logs, which provides a picture of how the tool is being used and how well it performs. We show that it is mostly used to find solutions for short, two or three word translation problems. The system produces at least one hit for 58% of the queries, and hits from at least five different web pages in 41% of cases. We show that 36% of the queries correspond to specialized language problems, which is much higher than what was previously reported for a similar concordancer based on the Canadian Hansard (TransSearch). We also provide a back of the envelope calculation of the current economic impact of the tool, which we estimate at $1 million per year, and growing rapidly.
Anthology ID:
2010.amta-government.13
Volume:
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program
Month:
October 31-November 4
Year:
2010
Address:
Denver, Colorado, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2010.amta-government.13
DOI:
Bibkey:
Cite (ACL):
Alain Désilets. 2010. WeBiText: Multilingual Concordancer Built from Public High Quality Web Content. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program, Denver, Colorado, USA. Association for Machine Translation in the Americas.
Cite (Informal):
WeBiText: Multilingual Concordancer Built from Public High Quality Web Content (Désilets, AMTA 2010)
Copy Citation: