Handling Idioms in Symbolic Multilingual Natural Language Generation

Michaelle Dubé, François Lareau


Abstract
While idioms are usually very rigid in their expression, they sometimes allow a certain level of freedom in their usage, with modifiers or complements splitting them or being syntactically attached to internal nodes rather than to the root (e.g., “take something with a big grain of salt”). This means that they cannot always be handled as ready-made strings in rule-based natural language generation systems. Having access to the internal syntactic structure of an idiom allows for more subtle processing. We propose a way to enumerate all possible language-independent n-node trees and to map particular idioms of a language onto these generic syntactic patterns. Using this method, we integrate the idioms from the LN-fr into GenDR, a multilingual realizer. Our implementation covers nearly 98% of LN-fr’s idioms with high precision, and can easily be extended or ported to other languages.
Anthology ID:
2022.mwe-1.17
Volume:
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
MWE
SIG:
SIGLEX
Publisher:
European Language Resources Association
Note:
Pages:
118–126
Language:
URL:
https://aclanthology.org/2022.mwe-1.17
DOI:
Bibkey:
Cite (ACL):
Michaelle Dubé and François Lareau. 2022. Handling Idioms in Symbolic Multilingual Natural Language Generation. In Proceedings of the 18th Workshop on Multiword Expressions @LREC2022, pages 118–126, Marseille, France. European Language Resources Association.
Cite (Informal):
Handling Idioms in Symbolic Multilingual Natural Language Generation (Dubé & Lareau, MWE 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.mwe-1.17.pdf