Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling
Ouyu Lan | Xiao Huang | Bill Yuchen Lin | He Jiang | Liyuan Liu | Xiang Ren
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Sequence labeling is a fundamental task for a range of natural language processing problems. When used in practice, its performance is largely influenced by the annotation quality and quantity, and meanwhile, obtaining ground truth labels is often costly. In many cases, ground truth labels do not exist, but noisy annotations or annotations from different domains are accessible. In this paper, we propose a novel framework Consensus Network (ConNet) that can be trained on annotations from multiple sources (e.g., crowd annotation, cross-domain data). It learns individual representation for every source and dynamically aggregates source-specific knowledge by a context-aware attention module. Finally, it leads to a model reflecting the agreement (consensus) among multiple sources. We evaluate the proposed framework in two practical settings of multi-source learning: learning with crowd annotations and unsupervised cross-domain model adaptation. Extensive experimental results show that our model achieves significant improvements over existing methods in both settings. We also demonstrate that the method can apply to various tasks and cope with different encoders.
AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging
Bill Yuchen Lin | Dong-Ho Lee | Frank F. Xu | Ouyu Lan | Xiang Ren
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
We introduce an open-source web-based data annotation framework (AlpacaTag) for sequence tagging tasks such as named-entity recognition (NER). The distinctive advantages of AlpacaTag are three-fold. 1) Active intelligent recommendation: dynamically suggesting annotations and sampling the most informative unlabeled instances with a back-end active learned model; 2) Automatic crowd consolidation: enhancing real-time inter-annotator agreement by merging inconsistent labels from multiple annotators; 3) Real-time model deployment: users can deploy their models in downstream systems while new annotations are being made. AlpacaTag is a comprehensive solution for sequence labeling tasks, ranging from rapid tagging with recommendations powered by active learning and auto-consolidation of crowd annotations to real-time model deployment.
- Bill Yuchen Lin 2
- Xiang Ren 2
- Dong-Ho Lee 1
- Frank F. Xu 1
- Xiao Huang 1
- show all...