Cross Lingual Arabic Blog Alerting (COLABA)

Kathleen Egan


Abstract
Social media and tools for communication over the Internet have expanded a great deal in recent years. This expansion offers a diverse set of users a means to communicate more freely and spontaneously in mixed languages and genres (blogs, message boards, chat, texting, video and images). Dialectal Arabic is pervasive in written social media, however current state of the art tools made for Modern Standard Arabic (MSA) fail on Arabic dialects. COLABA enables MSA users to interpret dialects correctly. It helps find Arabic colloquial content that is currently not easily searchable and accessible to MSA queries. The COLABA team has built a suite of tools that will offer users the ability to anonymously capture online unstructured media content from blogs to comprehend, organize, and validate content from informal and colloquial genres of online communication in MSA and a variety of Arabic dialects. The DoD/Combating Terrorism Technical Support Office/Technical Support Working Group (CTTSO/TSWG) awarded the contract to Acxiom Corporation and partners from MTI/IBM, Columbia University, Janya and Wichita State University to bring joint expertise to address this challenge. The suite has several use applications: Support for language and cultural learning by making colloquial Arabic intelligible to students of MSA; Retrieval and prioritization for triage and content analysis by finding Arabic colloquial and dialect terms that today's search engines miss; by providing appropriate interpretations of colloquial Arabic, which is opaque to current analytics approaches; and by Identify named entities, events, topics, and sentiment. Enabling improved translations by MSA-trained MT systems through decreases in out-of-vocabulary terms achieved by means of colloquial term conversion to MSA.
Anthology ID:
2010.amta-government.5
Volume:
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program
Month:
October 31-November 4
Year:
2010
Address:
Denver, Colorado, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2010.amta-government.5
DOI:
Bibkey:
Cite (ACL):
Kathleen Egan. 2010. Cross Lingual Arabic Blog Alerting (COLABA). In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program, Denver, Colorado, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Cross Lingual Arabic Blog Alerting (COLABA) (Egan, AMTA 2010)
Copy Citation: