2023
pdf
bib
abs
Massively Multi-Lingual Event Understanding: Extraction, Visualization, and Search
Chris Jenkins
|
Shantanu Agarwal
|
Joel Barry
|
Steven Fincke
|
Elizabeth Boschee
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
In this paper, we present ISI-Clear, a state-of-the-art, cross-lingual, zero-shot event extraction system and accompanying user interface for event visualization & search. Using only English training data, ISI-Clear makes global events available on-demand, processing user-supplied text in 100 languages ranging from Afrikaans to Yiddish. We provide multiple event-centric views of extracted events, including both a graphical representation and a document-level summary. We also integrate existing cross-lingual search algorithms with event extraction capabilities to provide cross-lingual event-centric search, allowing English-speaking users to search over events automatically extracted from a corpus of non-English documents, using either English natural language queries (e.g. “cholera outbreaks in Iran”) or structured queries (e.g. find all events of type Disease-Outbreak with agent “cholera” and location “Iran”).
2020
pdf
bib
abs
SEARCHER: Shared Embedding Architecture for Effective Retrieval
Joel Barry
|
Elizabeth Boschee
|
Marjorie Freedman
|
Scott Miller
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)
We describe an approach to cross lingual information retrieval that does not rely on explicit translation of either document or query terms. Instead, both queries and documents are mapped into a shared embedding space where retrieval is performed. We discuss potential advantages of the approach in handling polysemy and synonymy. We present a method for training the model, and give details of the model implementation. We present experimental results for two cases: Somali-English and Bulgarian-English CLIR.
2019
pdf
bib
abs
SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage
Elizabeth Boschee
|
Joel Barry
|
Jayadev Billa
|
Marjorie Freedman
|
Thamme Gowda
|
Constantine Lignos
|
Chester Palen-Michel
|
Michael Pust
|
Banriskhem Kayang Khonglah
|
Srikanth Madikeri
|
Jonathan May
|
Scott Miller
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
With the increasing democratization of electronic media, vast information resources are available in less-frequently-taught languages such as Swahili or Somali. That information, which may be crucially important and not available elsewhere, can be difficult for monolingual English speakers to effectively access. In this paper we present an end-to-end cross-lingual information retrieval (CLIR) and summarization system for low-resource languages that 1) enables English speakers to search foreign language repositories of text and audio using English queries, 2) summarizes the retrieved documents in English with respect to a particular information need, and 3) provides complete transcriptions and translations as needed. The SARAL system achieved the top end-to-end performance in the most recent IARPA MATERIAL CLIR+summarization evaluations. Our demonstration system provides end-to-end open query retrieval and summarization capability, and presents the original source text or audio, speech transcription, and machine translation, for two low resource languages.