Debora Ciminari
2026
UniBO at MWE-2026 PARSEME 2.0 Subtask 2: A Cross-lingual Approach to Multiword Expression Paraphrasing
Debora Ciminari | Alberto Barrón-Cedeño
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Debora Ciminari | Alberto Barrón-Cedeño
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
This paper describes MISP (Multilingual Id-iomatic Sentence Paraphrasing), a system sub-mitted to the PARSEME 2.0 MultilingualShared Task on Identification and Paraphras-ing of Multiword Expressions (MWEs). Weparticipated in Subtask 2 on MWE para-phrasing and developed our system based onQwen3-4B-Instruct fine-tuned on syntheticPortuguese MWE paraphrases. We appliedMISP not only to Portuguese, but also to Frenchand Romanian, aiming to leverage cross-lingualtransfer within related languages, with ours be-ing the only submission for Portuguese. Ourresults indicate that MISP struggles to generateparaphrases that both rephrase and preserve theoriginal meaning of the MWE. Additionally,instruction fine-tuning does not appear to im-prove performance. Overall, our findings high-light the challenges of paraphrasing MWEs,particularly in a cross-lingual setting
2025
A Tough Hoe to Row: Instruction Fine-Tuning LLaMA 3.2 for Multilingual Idiom Processing
Debora Ciminari | Alberto Barrón-Cedeño
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
Debora Ciminari | Alberto Barrón-Cedeño
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
MultiCoPIE: A Multilingual Corpus of Potentially Idiomatic Expressions for Cross-lingual PIE Disambiguation
Uliana Sentsova | Debora Ciminari | Josef Van Genabith | Cristina España-Bonet
Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025)
Uliana Sentsova | Debora Ciminari | Josef Van Genabith | Cristina España-Bonet
Proceedings of the 21st Workshop on Multiword Expressions (MWE 2025)
Language models are able to handle compositionality and, to some extent, non-compositional phenomena such as semantic idiosyncrasy, a feature most prominent in the case of idioms. This work introduces the MultiCoPIE corpus that includes potentially idiomatic expressions in Catalan, Italian, and Russian, extending the language coverage of PIE corpus data. The new corpus provides additional linguistic features of idioms, such as their semantic compositionality, part-of-speech of idiom head as well as their corresponding idiomatic expressions in English. With this new resource at hand, we first fine-tune an XLM-RoBERTa model to classify figurative and literal usage of potentially idiomatic expressions in English. We then study cross-lingual transfer to the languages represented in the MultiCoPIE corpus, evaluating the model’s ability to generalize an idiom-related task to languages not seen during fine-tuning. We show the effect of ‘cross-lingual lexical overlap’: the performance of the model, fine-tuned on English idiomatic expressions and tested on the MultiCoPIE languages, increases significantly when classifying ‘shared idioms’ -idiomatic expressions that have direct counterparts in English with similar form and meaning. While this observation raises questions about the generalizability of cross-lingual learning, the results from experiments on PIEs demonstrate strong evidence of effective cross-lingual transfer, even when accounting for idioms similar across languages.