SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup

Rongzhi Zhang, Yue Yu, Chao Zhang


Abstract
Active learning is an important technique for low-resource sequence labeling tasks. However, current active sequence labeling methods use the queried samples alone in each iteration, which is an inefficient way of leveraging human annotations. We propose a simple but effective data augmentation method to improve label efficiency of active sequence labeling. Our method, SeqMix, simply augments the queried samples by generating extra labeled sequences in each iteration. The key difficulty is to generate plausible sequences along with token-level labels. In SeqMix, we address this challenge by performing mixup for both sequences and token-level labels of the queried samples. Furthermore, we design a discriminator during sequence mixup, which judges whether the generated sequences are plausible or not. Our experiments on Named Entity Recognition and Event Detection tasks show that SeqMix can improve the standard active sequence labeling method by 2.27%–3.75% in terms of F1 scores. The code and data for SeqMix can be found at https://github.com/rz-zhang/SeqMix.
Anthology ID:
2020.emnlp-main.691
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8566–8579
Language:
URL:
https://aclanthology.org/2020.emnlp-main.691
DOI:
10.18653/v1/2020.emnlp-main.691
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.691.pdf
Optional supplementary material:
 2020.emnlp-main.691.OptionalSupplementaryMaterial.zip
Video:
 https://slideslive.com/38938974
Code
 rz-zhang/SeqMix
Data
CoNLL-2003