Bits and Pieces: Investigating the Effects of Subwords in Multi-task Parsing across Languages and Domains

Daniel Dakota, Sandra Kübler


Abstract
Neural parsing is very dependent on the underlying language model. However, very little is known about how choices in the language model affect parsing performance, especially in multi-task learning. We investigate questions on how the choice of subwords affects parsing, how subword sharing is responsible for gains or negative transfer in a multi-task setting where each task is parsing of a specific domain of the same language. More specifically, we investigate these issues across four languages: English, German, Italian, and Turkish. We find a general preference for averaged or last subwords across languages and domains. However, specific POS tags may require different subwords, and the distributional overlap between subwords across domains is perhaps a more influential factor in determining positive or negative transfer than discrepancies in the data sizes.
Anthology ID:
2024.lrec-main.215
Original:
2024.lrec-main.215v1
Version 2:
2024.lrec-main.215v2
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
2397–2409
Language:
URL:
https://aclanthology.org/2024.lrec-main.215
DOI:
Bibkey:
Cite (ACL):
Daniel Dakota and Sandra Kübler. 2024. Bits and Pieces: Investigating the Effects of Subwords in Multi-task Parsing across Languages and Domains. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2397–2409, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Bits and Pieces: Investigating the Effects of Subwords in Multi-task Parsing across Languages and Domains (Dakota & Kübler, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.215.pdf