SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation

Hee Suk Yoon, Eunseop Yoon, John Harvill, Sunjae Yoon, Mark Hasegawa-Johnson, Chang Yoo


Abstract
Word Sense Disambiguation (WSD) is an NLP task aimed at determining the correct sense of a word in a sentence from discrete sense choices. Although current systems have attained unprecedented performances for such tasks, the nonuniform distribution of word senses during training generally results in systems performing poorly on rare senses. To this end, we consider data augmentation to increase the frequency of these least frequent senses (LFS) to reduce the distributional bias of senses during training. We propose Sense-Maintained Sentence Mixup (SMSMix), a novel word-level mixup method that maintains the sense of a target word. SMSMix smoothly blends two sentences using mask prediction while preserving the relevant span determined by saliency scores to maintain a specific word’s sense. To the best of our knowledge, this is the first attempt to apply mixup in NLP while preserving the meaning of a specific word. With extensive experiments, we validate that our augmentation method can effectively give more information about rare senses during training with maintained target sense label.
Anthology ID:
2022.findings-emnlp.107
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1493–1502
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.107
DOI:
10.18653/v1/2022.findings-emnlp.107
Bibkey:
Cite (ACL):
Hee Suk Yoon, Eunseop Yoon, John Harvill, Sunjae Yoon, Mark Hasegawa-Johnson, and Chang Yoo. 2022. SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1493–1502, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation (Yoon et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-emnlp.107.pdf