Align-smatch: A Novel Evaluation Method for Chinese Abstract Meaning Representation Parsing based on Alignment of Concept and Relation
Liming Xiao | Bin Li | Zhixing Xu | Kairui Huo | Minxuan Feng | Junsheng Zhou | Weiguang Qu
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Abstract Meaning Representation is a sentence-level meaning representation, which abstracts the meaning of sentences into a rooted acyclic directed graph. With the continuous expansion of Chinese AMR corpus, more and more scholars have developed parsing systems to automatically parse sentences into Chinese AMR. However, the current parsers can’t deal with concept alignment and relation alignment, let alone the evaluation methods for AMR parsing. Therefore, to make up for the vacancy of Chinese AMR parsing evaluation methods, based on AMR evaluation metric smatch, we have improved the algorithm of generating triples so that to make it compatible with concept alignment and relation alignment. Finally, we obtain a new integrity metric align-smatch for paring evaluation. A comparative research then was conducted on 20 manually annotated AMR and gold AMR, with the result that align-smatch works well in alignments and more robust in evaluating arcs. We also put forward some fine-grained metric for evaluating concept alignment, relation alignment and implicit concepts, in order to further measure parsers’ performance in subtasks.

Automated Essay Scoring via Pairwise Contrastive Regression
Jiayi Xie | Kaiwei Cai | Li Kong | Junsheng Zhou | Weiguang Qu
Proceedings of the 29th International Conference on Computational Linguistics

Automated essay scoring (AES) involves the prediction of a score relating to the writing quality of an essay. Most existing works in AES utilize regression objectives or ranking objectives respectively. However, the two types of methods are highly complementary. To this end, in this paper we take inspiration from contrastive learning and propose a novel unified Neural Pairwise Contrastive Regression (NPCR) model in which both objectives are optimized simultaneously as a single loss. Specifically, we first design a neural pairwise ranking model to guarantee the global ranking order in a large list of essays, and then we further extend this pairwise ranking model to predict the relative scores between an input essay and several reference essays. Additionally, a multi-sample voting strategy is employed for inference. We use Quadratic Weighted Kappa to evaluate our model on the public Automated Student Assessment Prize (ASAP) dataset, and the experimental results demonstrate that NPCR outperforms previous methods by a large margin, achieving the state-of-the-art average performance for the AES task.

基于特征融合的汉语被动句自动识别研究(Automatic Recognition of Chinese Passive Sentences Based on Feature Fusion)
Kang Hu (胡康) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Junsheng Zhou (周俊生) | Yanhui Gu (顾彦慧) | Bin Li (李斌)
Proceedings of the 21st Chinese National Conference on Computational Linguistics


The First International Ancient Chinese Word Segmentation and POS Tagging Bakeoff: Overview of the EvaHan 2022 Evaluation Campaign
Bin Li | Yiguo Yuan | Jingya Lu | Minxuan Feng | Chao Xu | Weiguang Qu | Dongbo Wang
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

This paper presents the results of the First Ancient Chinese Word Segmentation and POS Tagging Bakeoff (EvaHan), which was held at the Second Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) 2022, in the context of the 13th Edition of the Language Resources and Evaluation Conference (LREC 2022). We give the motivation for having an international shared contest, as well as the data and tracks. The contest is consisted of two modalities, closed and open. In the closed modality, the participants are only allowed to use the training data, obtained the highest F1 score of 96.03% and 92.05% in word segmentation and POS tagging. In the open modality, the participants can use whatever resource they have, with the highest F1 score of 96.34% and 92.56% in word segmentation and POS tagging. The scores on the blind test dataset decrease around 3 points, which shows that the out-of-vocabulary words still are the bottleneck for lexical analyzers.


中文连动句语义关系识别研究(Research on Semantic Relation Recognition of Chinese Serial-verb Sentences)
Chao Sun (孙超) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Yanhui Gu (顾彦慧) | Bin Li (李斌) | Junsheng Zhou (周俊生)
Proceedings of the 20th Chinese National Conference on Computational Linguistics


中文词语离合现象识别研究(Research on Recognition of the Separation and Reunion Phenomena of Words in Chinese)
Lou Zhou (周露) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Junsheng Zhou (周俊生) | Bin Li (李斌) | Yanhui Gu (顾彦慧)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

汉语词语的离合现象是汉语中一种词语可分可合的特殊现象。本文采用字符级序列标注方法解决二字动词离合现象的自动识别问题,以避免中文分词及词性标注的错误传递,节省制定匹配规则与特征模板的人工开支。在训练过程中微调BERT中文预训练模型,获取面向目标任务的字符向量表示,并引入掩码机制对模型隐藏离用法中分离的词语,减轻词语本身对识别结果的影响,强化中间插入成分的学习,并对前后语素采用不同的掩码以强调其出现顺序,进而使模型具备了识别复杂及偶发性离用法的能力。为获得含有上下文信息的句子表达,将原始的句子表达与采用掩码的句子表达分别输入两个不同参数的BiLSTM层进行训练,最后采用CRF算法捕捉句子标签序列的依赖关系。本文提出的BERT MASK + 2BiLSTMs + CRF模型比现有最优的离合词识别模型提高了2.85%的F1值。

Event Detection as Graph Parsing
Jianye Xie | Haotong Sun | Junsheng Zhou | Weiguang Qu | Xinyu Dai
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021


Construct a Sense-Frame Aligned Predicate Lexicon for Chinese AMR Corpus
Li Song | Yuling Dai | Yihuan Liu | Bin Li | Weiguang Qu
Proceedings of the Twelfth Language Resources and Evaluation Conference

The study of predicate frame is an important topic for semantic analysis. Abstract Meaning Representation (AMR) is an emerging graph based semantic representation of a sentence. Since core semantic roles defined in the predicate lexicon compose the backbone in an AMR graph, the construction of the lexicon becomes the key issue. The existing lexicons blur senses and frames of predicates, which needs to be refined to meet the tasks like word sense disambiguation and event extraction. This paper introduces the on-going project on constructing a novel predicate lexicon for Chinese AMR corpus. The new lexicon includes 14,389 senses and 10,800 frames of 8,470 words. As some senses can be aligned to more than one frame, and vice versa, we found the alignment between senses is not just one frame per sense. Explicit analysis is given for multiple aligned relations, which proves the necessity of the proposed lexicon for AMR corpus, and supplies real data for linguistic theoretical studies.

An Element-aware Multi-representation Model for Law Article Prediction
Huilin Zhong | Junsheng Zhou | Weiguang Qu | Yunfei Long | Yanhui Gu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing works have proved that using law articles as external knowledge can improve the performance of the Legal Judgment Prediction. However, they do not fully use law article information and most of the current work is only for single label samples. In this paper, we propose a Law Article Element-aware Multi-representation Model (LEMM), which can make full use of law article information and can be used for multi-label samples. The model uses the labeled elements of law articles to extract fact description features from multiple angles. It generates multiple representations of a fact for classification. Every label has a law-aware fact representation to encode more information. To capture the dependencies between law articles, the model also introduces a self-attention mechanism between multiple representations. Compared with baseline models like TopJudge, this model improves the accuracy of 5.84%, the macro F1 of 6.42%, and the micro F1 of 4.28%.

多轮对话的篇章级抽象语义表示标注体系研究(Research on Discourse-level Abstract Meaning Representation Annotation framework in Multi-round Dialogue)
Tong Huang (黄彤) | Bin Li (李斌) | Peiyi Yan (闫培艺) | Tingting Ji (计婷婷) | Weiguang Qu (曲维光)
Proceedings of the 19th Chinese National Conference on Computational Linguistics


基于抽象语义表示的汉语疑问句的标注与分析(Chinese Interrogative Sentences Annotation and Analysis Based on the Abstract Meaning Representation)
Peiyi Yan (闫培艺) | Bin Li (李斌) | Tong Huang (黄彤) | Kairui Huo (霍凯蕊) | Jin Chen (陈瑾) | Weiguang Qu (曲维光)
Proceedings of the 19th Chinese National Conference on Computational Linguistics


基于神经网络的连动句识别(Recognition of serial-verb sentences based on Neural Network)
Chao Sun (孙超) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Yanhui Gu (顾彦慧) | Bin Li (李斌) | Junsheng Zhou (周俊生)
Proceedings of the 19th Chinese National Conference on Computational Linguistics


基于深度学习的实体关系抽取研究综述(Review of Entity Relation Extraction based on deep learning)
Zhentao Xia (夏振涛) | Weiguang Qu (曲维光) | Yanhui Gu (顾彦慧) | Junsheng Zhou (周俊生) | Bin Li (李斌)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

作为信息抽取的一项核心子任务,实体关系抽取对于知识图谱、智能问答、语义搜索等自然语言处理应用都十分重要。关系抽取在于从非结构化文本中自动地识别实体之间具有的某种语义关系。该文聚焦句子级别的关系抽取研究,介绍用于关系抽取的主要数据集并对现有的技术作了阐述,主要分为:有监督的关系抽取、远程监督的关系抽取和实体关系联合抽取。我们对比用于该任务的各种模型,分析它们的贡献与缺 陷。最后介绍中文实体关系抽取的研究现状和方法。

面向中文AMR标注体系的兼语语料库构建及识别研究(Research on the Construction and Recognition of Concurrent corpus for Chinese AMR Annotation System)
Wenhui Hou (侯文惠) | Weiguang Qu (曲维光) | Tingxin Wei (魏庭新) | Bin Li (李斌) | Yanhui Gu (顾彦慧) | Junsheng Zhou (周俊生)
Proceedings of the 19th Chinese National Conference on Computational Linguistics



Building a Chinese AMR Bank with Concept and Relation Alignments
Bin Li | Yuan Wen | Li Song | Weiguang Qu | Nianwen Xue
Linguistic Issues in Language Technology, Volume 18, 2019 - Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing

Abstract Meaning Representation (AMR) is a meaning representation framework in which the meaning of a full sentence is represented as a single-rooted, acyclic, directed graph. In this article, we describe an on-going project to build a Chinese AMR (CAMR) corpus, which currently includes 10,149 sentences from the newsgroup and weblog portion of the Chinese TreeBank (CTB). We describe the annotation specifications for the CAMR corpus, which follow the annotation principles of English AMR but make adaptations where needed to accommodate the linguistic facts of Chinese. The CAMR specifications also include a systematic treatment of sentence-internal discourse relations. One significant change we have made to the AMR annotation methodology is the inclusion of the alignment between word tokens in the sentence and the concepts/relations in the CAMR annotation to make it easier for automatic parsers to model the correspondence between a sentence and its meaning representation. We develop an annotation tool for CAMR, and the inter-agreement as measured by the Smatch score between the two annotators is 0.83, indicating reliable annotation. We also present some quantitative analysis of the CAMR corpus. 46.71% of the AMRs of the sentences are non-tree graphs. Moreover, the AMR of 88.95% of the sentences has concepts inferred from the context of the sentence but do not correspond to a specific word.

Ellipsis in Chinese AMR Corpus
Yihuan Liu | Bin Li | Peiyi Yan | Li Song | Weiguang Qu
Proceedings of the First International Workshop on Designing Meaning Representations

Ellipsis is very common in language. It’s necessary for natural language processing to restore the elided elements in a sentence. However, there’s only a few corpora annotating the ellipsis, which draws back the automatic detection and recovery of the ellipsis. This paper introduces the annotation of ellipsis in Chinese sentences, using a novel graph-based representation Abstract Meaning Representation (AMR), which has a good mechanism to restore the elided elements manually. We annotate 5,000 sentences selected from Chinese TreeBank (CTB). We find that 54.98% of sentences have ellipses. 92% of the ellipses are restored by copying the antecedents’ concepts. and 12.9% of them are the new added concepts. In addition, we find that the elided element is a word or phrase in most cases, but sometimes only the head of a phrase or parts of a phrase, which is rather hard for the automatic recovery of ellipsis.


A Fast Approach for Semantic Similar Short Texts Retrieval
Yanhui Gu | Zhenglu Yang | Junsheng Zhou | Weiguang Qu | Jinmao Wei | Xingtian Shi
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

AMR Parsing with an Incremental Joint Model
Junsheng Zhou | Feiyu Xu | Hans Uszkoreit | Weiguang Qu | Ran Li | Yanhui Gu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Annotating the Little Prince with Chinese AMRs
Bin Li | Yuan Wen | Weiguang Qu | Lijun Bu | Nianwen Xue
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)


现代汉语语义词典多义词词库的校正和再修订(New Editing and Checking Work of the Semantic Knowledge Base of Contemporary Chinese (SKCC))[In Chinese]
Yunfei Long | Yuefeng Bian | Weiguang Qu | Rubing Dai
Proceedings of the 27th Conference on Computational Linguistics and Speech Processing (ROCLING 2015)

Dependency parsing for Chinese long sentence: A second-stage main structure parsing method
Bo Li | Yunfei Long | Weiguang Qu
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters


Exploiting Chunk-level Features to Improve Phrase Chunking
Junsheng Zhou | Weiguang Qu | Fen Zhang
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning


A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
Decong Li | Sujian Li | Wenjie Li | Wei Wang | Weiguang Qu
Proceedings of the ACL 2010 Conference Short Papers

Semi-Supervised WSD in Selectional Preferences with Semantic Redundancy
Xuri Tang | Xiaohe Chen | Weiguang Qu | Shiwen Yu
Coling 2010: Posters