Proceedings of the Workshop on Collaborative Translation: technology, crowdsourcing, and the translator perspective
In the wake of the January 12 earthquake in Haiti it quickly became clear that the existing emergency response services had failed but text messages were still getting through. A number of people quickly came together to establish a text-message based emergency reporting system. There was one hurdle: the majority of the messages were in Haitian Kreyol, which for the most part was not understood by the primary emergency responders, the US Military. We therefore crowdsourced the translation of messages, allowing volunteers from within the Haitian Kreyol and French-speaking communities to translate, categorize and geolocate the messages in real-time. Collaborating online, they employed their local knowledge of locations, regional slang, abbreviations and spelling variants to process more than 40,000 messages in the first six weeks alone. According the responders this saved hundreds of lives and helped direct the first food and aid to tens of thousands. The average turn-around from a message arriving in Kreyol to it being translated, categorized, geolocated and streamed back to the responders was 10 minutes. Collaboration among translators was crucial for data-quality, motivation and community contacts, enabling richer value-adding in the translation than would have been possible from any one person.
The recent emergence of crowdsourced translation à la Facebook or Twitter has exposed a raw nerve in the translation industry. Perceptions of ill-placed entitlement -- we are the professionals who have the "right" to translate these products -- abound. And many have felt threatened by something that carries not only a relatively newly coined term -- crowdsourcing -- but seems in and of itself completely new. Or is it?
Targeted paraphrasing is a new approach to the problem of obtaining cost-effective, reasonable quality translation that makes use of simple and inexpensive human computations by monolingual speakers in combination with machine translation. The key insight behind the process is that it is possible to spot likely translation errors with only monolingual knowledge of the target language, and it is possible to generate alternative ways to say the same thing (i.e. paraphrases) with only monolingual knowledge of the source language. Evaluations demonstrate that this approach can yield substantial improvements in translation quality.
WikiBABEL: A System for Multilingual Wikipedia Content
A. Kumaran | Naren Datha | B. Ashok | K. Saravanan | Anil Ande | Ashwani Sharma | Sridhar Vedantham | Vidya Natampally | Vikram Dendi | Sandor Maurice
This position paper outlines our project – WikiBABEL – which will be released as an open source project for the creation of multilingual Wikipedia content, and has potential to produce parallel data as a by-product for Machine Translation systems research. We discuss its architecture, functionality and the user-experience components, and briefly present an analysis that emphasizes the resonance that the WikiBABEL design and the planned involvement with Wikipedia has with the open source communities in general and Wikipedians in particular.