Jinwen Luo
2023
Exploiting Hierarchically Structured Categories in Fine-grained Chinese Named Entity Recognition
Jiuding Yang
|
Jinwen Luo
|
Weidong Guo
|
Di Niu
|
Yu Xu
Findings of the Association for Computational Linguistics: ACL 2023
Chinese Named Entity Recognition (CNER) is a widely used technology in various applications. While recent studies have focused on utilizing additional information of the Chinese language and characters to enhance CNER performance, this paper focuses on a specific aspect of CNER known as fine-grained CNER (FG-CNER). FG-CNER involves the use of hierarchical, fine-grained categories (e.g. Person-MovieStar) to label named entities. To promote research in this area, we introduce the FiNE dataset, a dataset for FG-CNER consisting of 30,000 sentences from various domains and containing 67,651 entities in 54 fine-grained flattened hierarchical categories. Additionally, we propose SoftFiNE, a novel approach for FG-CNER that utilizes a custom-designed relevance scoring function based on label structures to learn the potential relevance between different flattened hierarchical labels. Our experimental results demonstrate that the proposed SoftFiNE method outperforms the state-of-the-art baselines on the FiNE dataset. Furthermore, we conduct extensive experiments on three other datasets, including OntoNotes 4.0, Weibo, and Resume, where SoftFiNE achieved state-of-the-art performance on all three datasets.
2022
MatRank: Text Re-ranking by Latent Preference Matrix
Jinwen Luo
|
Jiuding Yang
|
Weidong Guo
|
Chenglin Li
|
Di Niu
|
Yu Xu
Findings of the Association for Computational Linguistics: EMNLP 2022
Text ranking plays a key role in providing content that best answers user queries. It is usually divided into two sub-tasks to perform efficient information retrieval given a query: text retrieval and text re-ranking. Recent research on pretrained language models (PLM) has demonstrated efficiency and gain on both sub-tasks. However, while existing methods have benefited from pre-trained language models and achieved high recall rates on passage retrieval, the ranking performance still demands further improvement. In this paper, we propose MatRank, which learns to re-rank the text retrieved for a given query by learning to predict the most relevant passage based on a latent preference matrix. Specifically, MatRank uses a PLM to generate an asymmetric latent matrix of relative preference scores between all pairs of retrieved passages. Then, the latent matrix is aggregated row-wise and column-wise to obtain global preferences and predictions of the most relevant passage in two of these directions, respectively. We conduct extensive experiments on MS MACRO, WikiAQ, and SemEval datasets. Experimental results show that MatRank has achieved new state-of-the-art results on these datasets, outperforming all prior methods on ranking performance metrics.
2021
LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization
Weidong Guo
|
Mingjun Zhao
|
Lusheng Zhang
|
Di Niu
|
Jinwen Luo
|
Zhenhua Liu
|
Zhenyang Li
|
Jianbo Tang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021