Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing

Alberto Muñoz-Ortiz, Michalina Strzyz, David Vilares


Abstract
Different linearizations have been proposed to cast dependency parsing as sequence labeling and solve the task as: (i) a head selection problem, (ii) finding a representation of the token arcs as bracket strings, or (iii) associating partial transition sequences of a transition-based parser to words. Yet, there is little understanding about how these linearizations behave in low-resource setups. Here, we first study their data efficiency, simulating data-restricted setups from a diverse set of rich-resource treebanks. Second, we test whether such differences manifest in truly low-resource setups. The results show that head selection encodings are more data-efficient and perform better in an ideal (gold) framework, but that such advantage greatly vanishes in favour of bracketing formats when the running setup resembles a real-world low-resource configuration.
Anthology ID:
2021.ranlp-1.111
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
978–988
Language:
URL:
https://aclanthology.org/2021.ranlp-1.111
DOI:
Bibkey:
Cite (ACL):
Alberto Muñoz-Ortiz, Michalina Strzyz, and David Vilares. 2021. Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 978–988, Held Online. INCOMA Ltd..
Cite (Informal):
Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing (Muñoz-Ortiz et al., RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.111.pdf