Terminology Management for Web Monitoring

Sean Colbath

Current state-of-the-art in speech recognition, machine translation, and natural language processing (NLP) technologies has allowed the development of powerful media monitoring systems that provide today's analysts with automatic tools for ingesting and searching through different types of data, such as broadcast video, web pages, documents, and scanned images. However the core human-language technologies (HLT) in these media monitoring systems are static learners, which mean that they learn from a pool of labeled data and apply the induced knowledge to operational data in the field. To enable successful and widespread deployment and adoption of HLT, these technologies need to be able to adapt effectively to new operational domains on demand. To provide the US Government analyst with dynamic tools that adapt to these changing domains, these HLT systems must support customizable lexicons. However, the lexicon customization capability in HLT systems presents another unique challenge especially in the context of multiple users of typical media monitoring system installations in the field. Lexicon customization requests from multiple users can be quite extensive, and may conflict in orthographic representation (spelling, transliteration, or stylistic consistency) or in overall meaning. To protect against spurious and inconsistent updates to the system, the media monitoring systems need to support a central terminology management capability to collect, manage, and execute customization requests across multiple users of the system. In this talk, we will describe the integration of a user-driven lexicon/dictionary customization and terminology management capability in the context of the Raytheon BBN Web Monitoring System (WMS) to allow intelligence analysts to update the Machine Translation (MT) system in the WMS with domain- and mission-specific source-to-English phrase translation rules. The Language Learning Broker (LLB) tool from the Technology Development Group (TDG) is a distributed system that supports dictionary/terminology management, personalized dictionaries, and a workflow between linguists and linguist management. LLB is integrated with the WMS to provide a terminology management capability for users to submit, review, validate, and manage customizations of the MT system through the WMS User Interface (UI). We will also describe an ongoing experiment to measure the effectiveness of this user-driven customization capability, in terms of increased translation utility, through a controlled experiment conducted with the help of intelligence analysts.
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program
October 31-November 4
Denver, Colorado, USA
Association for Machine Translation in the Americas
