Multilingual Abstract Meaning Representation for Celtic Languages

Johannes Heinecke, Anastasia Shimorina


Abstract
Deep Semantic Parsing into Abstract Meaning Representation (AMR) graphs has reached a high quality with neural-based seq2seq approaches. However, the training corpus for AMR is only available for English. Several approaches to process other languages exist, but only for high resource languages. We present an approach to create a multilingual text-to-AMR model for three Celtic languages, Welsh (P-Celtic) and the closely related Irish and Scottish-Gaelic (Q-Celtic). The main success of this approach are underlying multilingual transformers like mT5. We finally show that machine translated test corpora unfairly improve the AMR evaluation for about 1 or 2 points (depending on the language).
Anthology ID:
2022.cltw-1.1
Volume:
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Theodorus Fransen, William Lamb, Delyth Prys
Venue:
CLTW
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1–6
Language:
URL:
https://aclanthology.org/2022.cltw-1.1
DOI:
Bibkey:
Cite (ACL):
Johannes Heinecke and Anastasia Shimorina. 2022. Multilingual Abstract Meaning Representation for Celtic Languages. In Proceedings of the 4th Celtic Language Technology Workshop within LREC2022, pages 1–6, Marseille, France. European Language Resources Association.
Cite (Informal):
Multilingual Abstract Meaning Representation for Celtic Languages (Heinecke & Shimorina, CLTW 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.cltw-1.1.pdf