Croatian Idioms Integration: Enhancing the LIdioms Multilingual Linked Idioms Dataset

Ivana Filipović Petrović, Miguel López Otal, Slobodan Beliga


Abstract
Idioms, also referred to as phraseological units in some language terminologies, are a subset within the broader category of multi-word expressions. However, there is a lack of representation of idioms in Croatian, a low-resourced language, in the Linguistic Linked Open Data cloud (LLOD). To address this gap, we propose an extension of an existing RDF-based multilingual representation of idioms, referred to as the LIdioms dataset, which currently includes idioms from English, German, Italian, Portuguese, and Russian. This paper expands the existing resource by incorporating 1,042 Croatian idioms in an Ontolex Lemon format. In addition, to foster translation initiatives and facilitate intercultural exchange, these added Croatian idioms have also been linked to other idioms of the LIdioms dataset, with which they share similar meanings despite their differences in the expression aspect. This addition enriches the knowledge base of the LLOD community with a new language resource that includes Croatian idioms.
Anthology ID:
2024.lrec-main.366
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
4106–4112
Language:
URL:
https://aclanthology.org/2024.lrec-main.366
DOI:
Bibkey:
Cite (ACL):
Ivana Filipović Petrović, Miguel López Otal, and Slobodan Beliga. 2024. Croatian Idioms Integration: Enhancing the LIdioms Multilingual Linked Idioms Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4106–4112, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Croatian Idioms Integration: Enhancing the LIdioms Multilingual Linked Idioms Dataset (Filipović Petrović et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.366.pdf