Reihaneh Iranmanesh
2026
Segmentation Strategy Matters: Benchmarking Whisper on Persian YouTube Content
Reihaneh Iranmanesh | Rojin Ziaei | Joe Garman
The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family
Reihaneh Iranmanesh | Rojin Ziaei | Joe Garman
The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family
Automatic Speech Recognition (ASR) transcription accuracy remains highly sensitive to audio segmentation strategies, yet most benchmarks assume oracle timestamps unavailable in deployment. We systematically evaluate how audio segmentation affects Whisper’s performance on 10 hours of Persian YouTube content, comparing transcript-aligned (oracle) versus silence-based (realistic) approaches across contrasting acoustic conditions. Results reveal striking content-type dependency: podcast content benefits from timestamp segmentation (33% lower mean WER), while entertainment content favors silence-based segmentation (8% lower mean WER). This finding demonstrates that optimal segmentation must be content-aware, with silence detection better capturing natural boundaries in acoustically heterogeneous media while avoiding mid-utterance splits. We publicly release our evaluation framework, 10 hours of audio with gold transcripts, and segmentation results here: https://github.com/ri164-bolleit/persian-youtube-whisper-benchmark
2025
The Structural Safety Generalization Problem
Julius Broomfield | Tom Gibbs | George Ingebretsen | Ethan Kosak-Hine | Tia Nasir | Jason Zhang | Reihaneh Iranmanesh | Sara Pieri | Reihaneh Rabbany | Kellin Pelrine
Findings of the Association for Computational Linguistics: ACL 2025
Julius Broomfield | Tom Gibbs | George Ingebretsen | Ethan Kosak-Hine | Tia Nasir | Jason Zhang | Reihaneh Iranmanesh | Sara Pieri | Reihaneh Rabbany | Kellin Pelrine
Findings of the Association for Computational Linguistics: ACL 2025
LLM jailbreaks are a widespread safety challenge. Given this problem has not yet been tractable, we suggest targeting a key failure mechanism: the failure of safety to generalize across semantically equivalent inputs. We further focus the target by requiring desirable tractability properties of attacks to study: explainability, transferability between models, and transferability between goals. We perform red-teaming within this framework by uncovering new vulnerabilities to multi-turn, multi-image, and translation-based attacks. These attacks are semantically equivalent by our design to their single-turn, single-image, or untranslated counterparts, enabling systematic comparisons; we show that the different structures yield different safety outcomes. We then demonstrate the potential for this framework to enable new defenses by proposing a Structure Rewriting Guardrail, which converts an input to a structure more conducive to safety assessment. This guardrail significantly improves refusal of harmful inputs, without over-refusing benign ones. Thus, by framing this intermediate challenge—more tractable than universal defenses but essential for long-term safety—we highlight a critical milestone for AI safety research.
Generating Text from Uniform Meaning Representation
Emma Markle | Reihaneh Iranmanesh | Shira Wein
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Emma Markle | Reihaneh Iranmanesh | Shira Wein
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Uniform Meaning Representation (UMR) is a recently developed graph-based semantic representation, which expands on Abstract Meaning Representation (AMR) in a number of ways, in particular through the inclusion of document-level information and multilingual flexibility. In order to effectively adopt and leverage UMR for downstream tasks, efforts must be placed toward developing a UMR technological ecosystem. Though only a small amount of UMR annotations have been produced to date, in this work, we investigate the first approaches to producing text from multilingual UMR graphs. Exploiting the structural similarity between UMR and AMR graphs and the wide availability of AMR technologies, we introduce (1) a baseline approach which passes UMR graphs to AMR-to-text generation models, (2) a pipeline conversion of UMR to AMR, then using AMR-to-text generation models, and (3) a fine-tuning approach for both foundation models and AMR-to-text generation models with UMR data. Our best performing models achieve multilingual BERTscores of 0.825 for English and 0.882 for Chinese, a promising indication of the effectiveness of fine-tuning approaches for UMR-to-text generation even with limited UMR data.