Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

Stav Klein; Reut Tsarfaty

doi:10.18653/v1/2020.sigmorphon-1.24

Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

Abstract

This work investigates the most basic units that underlie contextualized word embeddings, such as BERT — the so-called word pieces. In Morphologically-Rich Languages (MRLs) which exhibit morphological fusion and non-concatenative morphology, the different units of meaning within a word may be fused, intertwined, and cannot be separated linearly. Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance. Here we empirically examine the capacity of word-pieces to capture morphology by investigating the task of multi-tagging in Modern Hebrew, as a proxy to evaluate the underlying segmentation. Our results show that, while models trained to predict multi-tags for complete words outperform models tuned to predict the distinct tags of WPs, we can improve the WPs tag prediction by purposefully constraining the word-pieces to reflect their internal functions. We suggest that linguistically-informed word-pieces schemes, that make the morphological structure explicit, might boost performance for MRLs.

Anthology ID:: 2020.sigmorphon-1.24
Volume:: Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:: July
Year:: 2020
Address:: Online
Editors:: Garrett Nicolai, Kyle Gorman, Ryan Cotterell
Venue:: SIGMORPHON
SIG:: SIGMORPHON
Publisher:: Association for Computational Linguistics
Note:
Pages:: 204–209
Language:
URL:: https://aclanthology.org/2020.sigmorphon-1.24/
DOI:: 10.18653/v1/2020.sigmorphon-1.24
Bibkey:
Cite (ACL):: Stav Klein and Reut Tsarfaty. 2020. Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?. In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 204–209, Online. Association for Computational Linguistics.
Cite (Informal):: Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology? (Klein & Tsarfaty, SIGMORPHON 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.sigmorphon-1.24.pdf
Video:: http://slideslive.com/38929877

PDF Cite Search Video Fix data