Moran Mizrahi


2024

pdf bib
State of What Art? A Call for Multi-Prompt LLM Evaluation
Moran Mizrahi | Guy Kaplan | Dan Malkin | Rotem Dror | Dafna Shahaf | Gabriel Stanovsky
Transactions of the Association for Computational Linguistics, Volume 12

Recent advances in LLMs have led to an abundance of evaluation benchmarks, which typically rely on a single instruction template per task. We create a large-scale collection of instruction paraphrases and comprehensively analyze the brittleness introduced by single-prompt evaluations across 6.5M instances, involving 20 different LLMs and 39 tasks from 3 benchmarks. We find that different instruction templates lead to very different performance, both absolute and relative. Instead, we propose a set of diverse metrics on multiple instruction paraphrases, specifically tailored for different use cases (e.g., LLM vs. downstream development), ensuring a more reliable and meaningful assessment of LLM capabilities. We show that our metrics provide new insights into the strengths and limitations of current LLMs.

2020

pdf bib
Coming to Terms: Automatic Formation of Neologisms in Hebrew
Moran Mizrahi | Stav Yardeni Seelig | Dafna Shahaf
Findings of the Association for Computational Linguistics: EMNLP 2020

Spoken languages are ever-changing, with new words entering them all the time. However, coming up with new words (neologisms) today relies exclusively on human creativity. In this paper we propose a system to automatically suggest neologisms. We focus on the Hebrew language as a test case due to the unusual regularity of its noun formation. User studies comparing our algorithm to experts and non-experts demonstrate that our algorithm is capable of generating high-quality outputs, as well as enhance human creativity. More broadly, we seek to inspire more computational work around the topic of linguistic creativity, which we believe offers numerous unexplored opportunities.