Rareş-Alexandru Roşcan
Also published as: Rares-Alexandru Roscan
2026
Archaeology at WE-2026 PARSEME 2.0 Subtask 1 and 2: Parsing is for Encoders, Paraphrasing is for LLMs
Rares-Alexandru Roscan | Sergiu Nisioi
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Rares-Alexandru Roscan | Sergiu Nisioi
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
This paper presents our approach to the PARSEME 2.0 Shared Task on Romanian, covering both Identification (Subtask 1) and Paraphrasing (Subtask 2). While Large Language Models (LLMs) excel at semantic generation, we hypothesize that they lack the structural precision required for MWE identification, leading to "boundary hallucinations" that compromise downstream simplification. Our Rank 1 results on Romanian confirm this: a specialized encoder (RoBERT) using standard sequence labeling outperforms both few-shot LLMs and complex structural parsers (MTLB-STRUCT). This justifies our proposed pipeline: using encoders as precise “pointers” to guide the generative power of LLMs.
2025
Archaeology at TSAR 2025 Shared Task Teaching Small Models to do CEFR Simplifications
Rareş-Alexandru Roşcan | Sergiu Nisioi
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)
Rareş-Alexandru Roşcan | Sergiu Nisioi
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)
Large language models (LLMs) have demonstrated strong performance in text simplification tasks, but their high computational cost and proprietary nature often limit practical use, especially in education. We explore open-source LLMs for CEFR-level text simplification. By reducing model size and computational requirements, our approach enables greater accessibility and deployment in educational environments. Our results show some of the lowest error rates in producing CEFR-compliant texts at TSAR 2025, using models with 8 billion and 1 billion parameters. Such approaches have the potential to democratize NLP technologies for real-world applications.