Cross-Lingual Lemmatization and Morphology Tagging with Two-Stage Multilingual BERT Fine-Tuning

Dan Kondratyuk


Abstract
We present our CHARLES-SAARLAND system for the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology, in task 2, Morphological Analysis and Lemmatization in Context. We leverage the multilingual BERT model and apply several fine-tuning strategies introduced by UDify demonstrating exceptional evaluation performance on morpho-syntactic tasks. Our results show that fine-tuning multilingual BERT on the concatenation of all available treebanks allows the model to learn cross-lingual information that is able to boost lemmatization and morphology tagging accuracy over fine-tuning it purely monolingually. Unlike UDify, however, we show that when paired with additional character-level and word-level LSTM layers, a second stage of fine-tuning on each treebank individually can improve evaluation even further. Out of all submissions for this shared task, our system achieves the highest average accuracy and f1 score in morphology tagging and places second in average lemmatization accuracy.
Anthology ID:
W19-4203
Volume:
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Garrett Nicolai, Ryan Cotterell
Venue:
ACL
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–18
Language:
URL:
https://aclanthology.org/W19-4203
DOI:
10.18653/v1/W19-4203
Bibkey:
Cite (ACL):
Dan Kondratyuk. 2019. Cross-Lingual Lemmatization and Morphology Tagging with Two-Stage Multilingual BERT Fine-Tuning. In Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 12–18, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual Lemmatization and Morphology Tagging with Two-Stage Multilingual BERT Fine-Tuning (Kondratyuk, ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4203.pdf
Code
 hyperparticle/udify