BERT-based Idiom Identification using Language Translation and Word Cohesion

Arnav Yayavaram; Siddharth Yayavaram; Prajna Devi Upadhyay; Apurba Das

BERT-based Idiom Identification using Language Translation and Word Cohesion

Arnav Yayavaram, Siddharth Yayavaram, Prajna Devi Upadhyay, Apurba Das

Abstract

An idiom refers to a special type of multi-word expression whose meaning is figurative and cannot be deduced from the literal interpretation of its components. Idioms are prevalent in almost all languages and text genres, necessitating explicit handling by comprehensive NLP systems. Such phrases are referred to as Potentially Idiomatic Expressions (PIEs) and automatically identifying them in text is a challenging task. In this paper, we propose using a BERT-based model fine-tuned with custom objectives, to improve the accuracy of detecting PIEs in text. Our custom loss functions capture two important properties (word cohesion and language translation) to distinguish PIEs from non-PIEs. We conducted several experiments on 7 datasets and showed that incorporating custom objectives while training the model leads to substantial gains. Our models trained using this approach also have better sequence accuracy over DISC, a state-of-the-art PIE detection technique, along with good transfer capabilities.

Anthology ID:: 2024.mwe-1.26
Volume:: Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Archna Bhatia, Gosse Bouma, A. Seza Doğruöz, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Joakim Nivre, Alexandre Rademaker
Venues:: MWE | UDW | WS
SIGs:: SIGLEX | SIGPARSE
Publisher:: ELRA and ICCL
Note:
Pages:: 220–230
Language:
URL:: https://aclanthology.org/2024.mwe-1.26
DOI:
Bibkey:
Cite (ACL):: Arnav Yayavaram, Siddharth Yayavaram, Prajna Devi Upadhyay, and Apurba Das. 2024. BERT-based Idiom Identification using Language Translation and Word Cohesion. In Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, pages 220–230, Torino, Italia. ELRA and ICCL.
Cite (Informal):: BERT-based Idiom Identification using Language Translation and Word Cohesion (Yayavaram et al., MWE-UDW-WS 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.mwe-1.26.pdf

PDF Cite Search