Word-level Perturbation Considering Word Length and Compositional Subwords

Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki


Abstract
We present two simple modifications for word-level perturbation: Word Replacement considering Length (WR-L) and Compositional Word Replacement (CWR).In conventional word replacement, a word in an input is replaced with a word sampled from the entire vocabulary, regardless of the length and context of the target word.WR-L considers the length of a target word by sampling words from the Poisson distribution.CWR considers the compositional candidates by restricting the source of sampling to related words that appear in subword regularization. Experimental results showed that the combination of WR-L and CWR improved the performance of text classification and machine translation.
Anthology ID:
2022.findings-acl.258
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3268–3275
Language:
URL:
https://aclanthology.org/2022.findings-acl.258
DOI:
10.18653/v1/2022.findings-acl.258
Bibkey:
Cite (ACL):
Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, and Naoaki Okazaki. 2022. Word-level Perturbation Considering Word Length and Compositional Subwords. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3268–3275, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Word-level Perturbation Considering Word Length and Compositional Subwords (Hiraoka et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-acl.258.pdf
Code
 tathi/cwr