Qomhrá: A Bilingual Irish and English Large Language Model

Joseph McInerney; Khanh-Tung Tran; Liam Lonergan; Neasa Ní Chiaráin; Ailbhe Ní Chasaide; Barry Devereux

Qomhrá: A Bilingual Irish and English Large Language Model

Joseph McInerney, Khanh-Tung Tran, Liam Lonergan, Neasa Ní Chiaráin, Ailbhe Ni Chasaide, Barry Devereux

Abstract

Large language model (LLM) research and development has overwhelmingly focused on the world’s major languages, leading to under-representation of low-resource languages such as Irish. This paper introduces Qomhrá, a bilingual Irish and English LLM, developed under extremely low-resource constraints. A complete pipeline is outlined spanning bilingual continued pre-training, instruction tuning, and the synthesis of human preference data for future alignment training. We focus on the lack of scalable methods to create human preference data by proposing a novel method to synthesise such data by prompting an LLM to generate "accepted" and "rejected" responses, which we validate as aligning with L1 Irish speakers.To select an LLM for synthesis, we evaluate the top closed-weight LLMs for Irish language generation performance. Gemini-2.5-Pro is ranked highest by L1 and L2 Irish-speakers, diverging from LLM-as-a-judge ratings, indicating a misalignment between current LLMs and the Irish-language community. Subsequently, we leverage Gemini-2.5-Pro to translate a large scale English-language instruction tuning dataset to Irish and to synthesise a first-of-its-kind Irish-language human preference dataset. We comprehensively evaluate Qomhrá across several benchmarks, testing translation, gender understanding, topic identification, and world knowledge; these evaluations show gains of up to 29% in Irish and 44% in English compared to the existing open-source Irish LLM baseline, UCCIX. The results of our framework provide insight and guidance to developing LLMs for both Irish and other low-resource languages.

Anthology ID:: 2026.loreslm-1.18
Volume:: Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:: LoResLM
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 189–199
Language:
URL:: https://aclanthology.org/2026.loreslm-1.18/
DOI:
Bibkey:
Cite (ACL):: Joseph McInerney, Khanh-Tung Tran, Liam Lonergan, Neasa Ní Chiaráin, Ailbhe Ni Chasaide, and Barry Devereux. 2026. Qomhrá: A Bilingual Irish and English Large Language Model. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 189–199, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Qomhrá: A Bilingual Irish and English Large Language Model (McInerney et al., LoResLM 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.loreslm-1.18.pdf

PDF Cite Search Fix data