Liying Zheng
2022
AdaK-NER: An Adaptive Top-K Approach for Named Entity Recognition with Incomplete Annotations
Hongtao Ruan
|
Liying Zheng
|
Peixian Hu
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)
State-of-the-art Named Entity Recognition (NER) models rely heavily on large amounts of fully annotated training data. However, accessible data are often incompletely annotated since the annotators usually lack comprehensive knowledge in the target domain. Normally the unannotated tokens are regarded as non-entities by default, while we underline that these tokens could either be non-entities or part of any entity. Here, we study NER modeling with incomplete annotated data where only a fraction of the named entities are labeled, and the unlabeled tokens are equivalently multi-labeled by every possible label. Taking multi-labeled tokens into account, the numerous possible paths can distract the training model from the gold path (ground truth label sequence), and thus hinders the learning ability. In this paper, we propose AdaK-NER, named the adaptive top-K approach, to help the model focus on a smaller feasible region where the gold path is more likely to be located. We demonstrate the superiority of our approach through extensive experiments on both English and Chinese datasets, averagely improving 2% in F-score on the CoNLL-2003 and over 10% on two Chinese datasets compared with the prior state-of-the-art works.
2021
An Alignment-Agnostic Model for Chinese Text Error Correction
Liying Zheng
|
Yue Deng
|
Weishun Song
|
Liang Xu
|
Jing Xiao
Findings of the Association for Computational Linguistics: EMNLP 2021
This paper investigates how to correct Chinese text errors with types of mistaken, missing and redundant characters, which are common for Chinese native speakers. Most existing models based on detect-correct framework can correct mistaken characters, but cannot handle missing or redundant characters due to inconsistency between model inputs and outputs. Although Seq2Seq-based or sequence tagging methods provide solutions to the three error types and achieved relatively good results in English context, they do not perform well in Chinese context according to our experiments. In our work, we propose a novel alignment-agnostic detect-correct framework that can handle both text aligned and non-aligned situations and can serve as a cold start model when no annotation data are provided. Experimental results on three datasets demonstrate that our method is effective and achieves a better performance than most recent published models.
Search
Fix data
Co-authors
- Yue Deng 1
- Peixian Hu 1
- Hongtao Ruan 1
- Weishun Song 1
- Jing Xiao 1
- show all...
- Liang Xu 1