David Doermann
2026
CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering
Mahesh Bhosale | Abdul Wasi | Vishvesh Trivedi | Pengyu Yan | Akhil V S S Gorugantu | David Doermann
Proceedings of the 2nd Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2026)
Mahesh Bhosale | Abdul Wasi | Vishvesh Trivedi | Pengyu Yan | Akhil V S S Gorugantu | David Doermann
Proceedings of the 2nd Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2026)
Grounded multi-video question answering over real-world news events requires systems to surface query-relevant evidence across heterogeneous video archives while attributing every claim to its supporting source. We introduce CRAFT (Critic-Refined Adaptive Key-Frame Targeting), a query-conditioned pipeline that combines dynamic keyframe selection, per-video ASR with multilingual fallback, and a hybrid critic loop to iteratively verify and repair claims before consolidation. The pipeline integrates UNLI temporal entailment, DeBERTa-v3 cross-claim screening, and a Llama-3.2-3B adjudicator, with a final citation-merging stage that emits each fact once with all supporting source identifiers. On MAGMaR 2026, CRAFT achieves the best overall average (0.739), reference recall (0.810), and citation F1 (0.635). We further evaluate on a MAGMaR-style conversion of WikiVideo with 52 non-overlapping event queries, where CRAFT also performs strongly (0.823 Avg), showing that its claim-centric evidence aggregation generalizes beyond MAGMaR. Ablations show that atomic claims, ASR, and the critic loop drive the main gains over the vanilla query-conditioned baseline. Code and implementation details are publicly available at https://github.com/bhosalems/CRAFT.
TRACE: Evidence Grounding-Guided Multi-Video Event Understanding and Claim Generation
Pengyu Yan | Akhil V S S Gorugantu | Mahesh Bhosale | Abdul Wasi | Vishvesh Trivedi | David Doermann
Proceedings of the 2nd Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2026)
Pengyu Yan | Akhil V S S Gorugantu | Mahesh Bhosale | Abdul Wasi | Vishvesh Trivedi | David Doermann
Proceedings of the 2nd Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2026)
Multi-video event understanding demands models that can locate and attribute query-relevant evidence scattered across long, heterogeneous video corpora. Existing large vision–language models (LVLMs) often underperform in this regime because they quickly exhaust their context budget and struggle to precisely localize evidentially important segments, frequently missing dense informational cues such as broadcast graphics, subtitles, and scoreboards. We introduce TRACE, an evidence grounding-guided framework that follows a ground-before-reasoning strategy for multi-video event reasoning. Our approach first builds a structured, text-searchable timeline for each video using OCR and object detection. A text-only LLM then conducts query-aware evidence localization, selecting relevant moments prior to any downstream visual reasoning. The retrieved frames and their grounding summaries are subsequently used to steer LVLM-based claim generation and cross-video citation consolidation. Experiments on MAGMaR 2026 and WikiVideo demonstrate that structured grounding markedly boosts factual completeness and attribution fidelity. On the MAGMaR validation split, TRACE raises macro-average MiRAGE F1 from 0.705 to 0.811 compared to an unguided Qwen3-VL-30B baseline, with especially strong improvements in citation recall (0.440 0.628). The method also attains state-of-the-art results on the official MAGMaR 2026 leaderboard.
2012
A Random Forest System Combination Approach for Error Detection in Digital Dictionaries
Michael Bloodgood | Peng Ye | Paul Rodrigues | David Zajic | David Doermann
Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
Michael Bloodgood | Peng Ye | Paul Rodrigues | David Zajic | David Doermann
Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
Linguistic Resources for Handwriting Recognition and Translation Evaluation
Zhiyi Song | Safa Ismael | Stephen Grimes | David Doermann | Stephanie Strassel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Zhiyi Song | Safa Ismael | Stephen Grimes | David Doermann | Stephanie Strassel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
We describe efforts to create corpora to support development and evaluation of handwriting recognition and translation technology. LDC has developed a stable pipeline and infrastructures for collecting and annotating handwriting linguistic resources to support the evaluation of MADCAT and OpenHaRT. We collect and annotate handwritten samples of pre-processed Arabic and Chinese data that has been already translated in English that is used in the GALE program. To date, LDC has recruited more than 600 scribes and collected, annotated and released more than 225,000 handwriting images. Most linguistic resources created for these programs will be made available to the larger research community by publishing in LDC's catalog. The phase 1 MADCAT corpus is now available.
Leveraging Statistical Transliteration for Dictionary-Based English-Bengali CLIR of OCR‘d Text
Utpal Garain | Arjun Das | David Doermann | Douglas Oard
Proceedings of COLING 2012: Posters
Utpal Garain | Arjun Das | David Doermann | Douglas Oard
Proceedings of COLING 2012: Posters
2011
Cross-Language Entity Linking
Paul McNamee | James Mayfield | Dawn Lawrie | Douglas Oard | David Doermann
Proceedings of 5th International Joint Conference on Natural Language Processing
Paul McNamee | James Mayfield | Dawn Lawrie | Douglas Oard | David Doermann
Proceedings of 5th International Joint Conference on Natural Language Processing
2006
Morphology Induction from Limited Noisy Data Using Approximate String Matching
Burcu Karagol-Ayan | David Doermann | Amy Weinberg
Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology at HLT-NAACL 2006
Burcu Karagol-Ayan | David Doermann | Amy Weinberg
Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology at HLT-NAACL 2006
Adaptive Transformation-Based Learning for Improving Dictionary Tagging
Burcu Karagol-Ayan | David Doermann | Amy Weinberg
11th Conference of the European Chapter of the Association for Computational Linguistics
Burcu Karagol-Ayan | David Doermann | Amy Weinberg
11th Conference of the European Chapter of the Association for Computational Linguistics
2003
Desparately Seeking Cebuano
Douglas W. Oard | David Doermann | Bonnie Dorr | Daqing He | Philip Resnik | Amy Weinberg | William Byrne | Sanjeev Khudanpur | David Yarowsky | Anton Leuski | Philipp Koehn | Kevin Knight
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers
Douglas W. Oard | David Doermann | Bonnie Dorr | Daqing He | Philip Resnik | Amy Weinberg | William Byrne | Sanjeev Khudanpur | David Yarowsky | Anton Leuski | Philipp Koehn | Kevin Knight
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers
Acquisition of bilingual MT lexicons from OCRed dictionaries
Burcu Karagol-Ayan | David Doermann | Bonnie J. Dorr
Proceedings of Machine Translation Summit IX: Papers
Burcu Karagol-Ayan | David Doermann | Bonnie J. Dorr
Proceedings of Machine Translation Summit IX: Papers
This paper describes an approach to analyzing the lexical structure of OCRed bilingual dictionaries to construct resources suited for machine translation of low-density languages, where online resources are limited. A rule-based, an HMM-based, and a post-processed HMM-based method are used for rapid construction of MT lexicons based on systematic structural clues provided in the original dictionary. We evaluate the effectiveness of our techniques, concluding that: (1) the rule-based method performs better with dictionaries where the font is not an important distinguishing feature for determining information types; (2) the post-processed stochastic method improves the results of the stochastic method for phrasal entries; and (3) Our resulting bilingual lexicons are comprehensive enough to provide the basis for reasonable translation results when compared to human translations.
Search
Fix author
Co-authors
- Burcu Karagol-Ayan 3
- Douglas W. Oard 3
- Amy Weinberg 3
- Mahesh Bhosale 2
- Bonnie Dorr 2
- Akhil V S S Gorugantu 2
- Vishvesh Trivedi 2
- Abdul Wasi 2
- Pengyu Yan 2
- Michael Bloodgood 1
- Bill Byrne 1
- Arjun Das 1
- Utpal Garain 1
- Stephen Grimes 1
- Daqing He 1
- Safa Ismael 1
- Sanjeev Khudanpur 1
- Kevin Knight 1
- Philipp Koehn 1
- Dawn Lawrie 1
- Anton Leuski 1
- James Mayfield 1
- Paul McNamee 1
- Philip Resnik 1
- Paul Rodrigues 1
- Zhiyi Song 1
- Stephanie Strassel 1
- David Yarowsky 1
- Peng Ye 1
- David Zajic 1