Weiwei Sun

Shandong University


2024

pdf bib
MAIR: A Massive Benchmark for Evaluating Instructed Retrieval
Weiwei Sun | Zhengliang Shi | Wu Jiu Long | Lingyong Yan | Xinyu Ma | Yiding Liu | Min Cao | Dawei Yin | Zhaochun Ren
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Recent information retrieval (IR) models are pre-trained and instruction-tuned on massive datasets and tasks, enabling them to perform well on a wide range of tasks and potentially generalize to unseen tasks with instructions. However, existing IR benchmarks focus on a limited scope of tasks, making them insufficient for evaluating the latest IR models. In this paper, we propose MAIR (Massive Instructed Retrieval Benchmark), a heterogeneous IR benchmark that includes 126 distinct IR tasks across 6 domains, collected from existing datasets. We benchmark state-of-the-art instruction-tuned text embedding models and re-ranking models. Our experiments reveal that instruction-tuned models generally achieve superior performance compared to non-instruction-tuned models on MAIR Additionally, our results suggest that current instruction-tuned text embedding models and re-ranking models still lack effectiveness in specific long-tail tasks. MAIR is publicly available at https://github.com/sunnweiwei/Mair.

pdf bib
EtymoLink: A Structured English Etymology Dataset
Yuan Gao | Weiwei Sun
Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change

pdf bib
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method
Yukun Zhao | Lingyong Yan | Weiwei Sun | Guoliang Xing | Chong Meng | Shuaiqiang Wang | Zhicong Cheng | Zhaochun Ren | Dawei Yin
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large Language Models (LLMs) have shown great potential in Natural Language Processing (NLP) tasks.However, recent literature reveals that LLMs hallucinate intermittently, which impedes their reliability for further utilization. In this paper, we propose a novel self-detection method to detect which questions an LLM does not know.Our proposal is empirical and applicable for continually upgrading LLMs compared with state-of-the-art methods. Specifically, we examine the divergence of the LLM’s behaviors on different verbalizations for a question and examine the atypicality of the verbalized input. We combine the two components to identify whether the model generates a non-factual response to the question. The above components can be accomplished by utilizing the LLM itself without referring to any other external resources. We conduct comprehensive experiments and demonstrate the effectiveness of our method for recently released LLMs involving Llama 2, Vicuna, ChatGPT, and GPT-4 across factoid question-answering, arithmetic reasoning, and commonsense reasoning tasks.

pdf bib
UG-schematic Annotation for Event Nominals: A Case Study in Mandarin Chinese
Wenxi Li | Yutong Zhang | Guy Emerson | Weiwei Sun
Computational Linguistics, Volume 50, Issue 2 - June 2023

Divergence of languages observed at the surface level is a major challenge encountered by multilingual data representation, especially when typologically distant languages are involved. Drawing inspiration from a formalist Chomskyan perspective towards language universals, Universal Grammar (UG), this article uses deductively pre-defined universals to analyze a multilingually heterogeneous phenomenon, event nominals. In this way, deeper universality of event nominals beneath their huge divergence in different languages is uncovered, which empowers us to break barriers between languages and thus extend insights from some synthetic languages to a non-inflectional language, Mandarin Chinese. Our empirical investigation also demonstrates this UG-inspired schema is effective: With its assistance, the inter-annotator agreement (IAA) for identifying event nominals in Mandarin grows from 88.02% to 94.99%, and automatic detection of event-reading nominalizations on the newly-established data achieves an accuracy of 94.76% and an F1 score of 91.3%, which significantly surpass those achieved on the pre-existing resource by 9.8% and 5.2%, respectively. Our systematic analysis also sheds light on nominal semantic role labeling. By providing a clear definition and classification on arguments of event nominal, the IAA of this task significantly increases from 90.46% to 98.04%.

pdf bib
MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter
Jitai Hao | Weiwei Sun | Xin Xin | Qi Meng | Zhumin Chen | Pengjie Ren | Zhaochun Ren
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Parameter-Efficient Fine-tuning (PEFT) facilitates the fine-tuning of Large Language Models (LLMs) under limited resources. However, the fine-tuning performance with PEFT on complex, knowledge-intensive tasks is limited due to the constrained model capacity, which originates from the limited number of additional trainable parameters. To overcome this limitation, we introduce a novel mechanism that fine-tunes LLMs with adapters of larger size yet memory-efficient. This is achieved by leveraging the inherent activation sparsity in the Feed-Forward Networks (FFNs) of LLMs and utilizing the larger capacity of Central Processing Unit (CPU) memory compared to Graphics Processing Unit (GPU). We store and update the parameters of larger adapters on the CPU. Moreover, we employ a Mixture of Experts (MoE)-like architecture to mitigate unnecessary CPU computations and reduce the communication volume between the GPU and CPU. This is particularly beneficial over the limited bandwidth of PCI Express (PCIe). Our method can achieve fine-tuning results comparable to those obtained with larger memory capacities, even when operating under more limited resources such as a 24GB memory single GPU setup, with acceptable loss in training efficiency. Our codes are available at https://github.com/CURRENTF/MEFT.

pdf bib
Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering
Zhengliang Shi | Shuo Zhang | Weiwei Sun | Shen Gao | Pengjie Ren | Zhumin Chen | Zhaochun Ren
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multi-Hop Question Answering (MHQA) task presents a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. GenGround empowers LLMs to alternate two phases until the final answer is derived: (1) formulate a simpler, single-hop question and directly generate the answer; (2) ground the question-answer pair into retrieved documents, amending any wrong predictions in the answer. We also propose an instructional grounding distillation method to generalize our method into smaller models. Extensive experiments conducted on four datasets illustrate the superiority of our method. To further facilitate future research, we have collected a dataset that traces the reasoning process.

pdf bib
How Large Language Models Encode Context Knowledge? A Layer-Wise Probing Study
Tianjie Ju | Weiwei Sun | Wei Du | Xinwei Yuan | Zhaochun Ren | Gongshen Liu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Previous work has showcased the intriguing capability of large language models (LLMs) in retrieving facts and processing context knowledge. However, only limited research exists on the layer-wise capability of LLMs to encode knowledge, which challenges our understanding of their internal mechanisms. In this paper, we devote the first attempt to investigate the layer-wise capability of LLMs through probing tasks. We leverage the powerful generative capability of ChatGPT to construct probing datasets, providing diverse and coherent evidence corresponding to various facts. We employ \mathcal V-usable information as the validation metric to better reflect the capability in encoding context knowledge across different layers. Our experiments on conflicting and newly acquired knowledge show that LLMs: (1) prefer to encode more context knowledge in the upper layers; (2) primarily encode context knowledge within knowledge-related entity tokens at lower layers while progressively expanding more knowledge within other tokens at upper layers; and (3) gradually forget the earlier context knowledge retained within the intermediate layers when provided with irrelevant evidence. Code is publicly available at https://github.com/Jometeorie/probing_llama.

pdf bib
Improving the Robustness of Large Language Models via Consistency Alignment
Yukun Zhao | Lingyong Yan | Weiwei Sun | Guoliang Xing | Shuaiqiang Wang | Chong Meng | Zhicong Cheng | Zhaochun Ren | Dawei Yin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses. Nevertheless, their robustness is still far from optimal, as they may generate significantly inconsistent responses due to minor changes in the verbalized instructions. Recent literature has explored this inconsistency issue, highlighting the importance of continued improvement in the robustness of response generation. However, systematic analysis and solutions are still lacking. In this paper, we quantitatively define the inconsistency problem and propose a two-stage training framework consisting of instruction-augmented supervised fine-tuning and consistency alignment training. The first stage helps a model generalize on following instructions via similar instruction augmentations. In the second stage, we improve the diversity and help the model understand which responses are more aligned with human expectations by differentiating subtle differences in similar responses. The training process is accomplished by self-rewards inferred from the trained model at the first stage without referring to external human preference resources. We conduct extensive experiments on recent publicly available LLMs on instruction-following tasks and demonstrate the effectiveness of our training framework.

2023

pdf bib
Answering Ambiguous Questions via Iterative Prompting
Weiwei Sun | Hengyi Cai | Hongshen Chen | Pengjie Ren | Zhumin Chen | Maarten de Rijke | Zhaochun Ren
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist. To provide feasible answers to an ambiguous question,one approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity. An alternative is to gather candidate answers and aggregate them, but this method can be computationally costly and may neglect dependencies among answers. In this paper, we present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions. Specifically, we integrate an answering model with a prompting model in an iterative manner. The prompting model adaptively tracks the reading process and progressively triggers the answering model to compose distinct and relevant answers. Additionally, we develop a task-specific post-pretraining approach for both the answering model and the prompting model, which greatly improves the performance of our framework. Empirical studies on two commonly-used open benchmarks show that AmbigPrompt achieves state-of-the-art or competitive results while using less memory and having a lower inference latency than competing approaches. Additionally, AmbigPrompt also performs well in low-resource settings.

pdf bib
RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue
Zhengliang Shi | Weiwei Sun | Shuo Zhang | Zhen Zhang | Pengjie Ren | Zhaochun Ren
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Evaluating open-domain dialogue systems is challenging for reasons such as the one-to-many problem, i.e., many appropriate responses other than just the golden response. As of now, automatic evaluation methods need better consistency with humans, while reliable human evaluation can be time- and cost-intensive. To this end, we propose the Reference-Assisted Dialogue Evaluation (RADE) approach under the multi-task learning framework, which leverages the pre-created utterance as reference other than the gold response to relief the one-to-many problem. Specifically, RADE explicitly compares reference and the candidate response to predict their overall scores. Moreover, an auxiliary response generation task enhances prediction via a shared encoder. To support RADE, we extend three datasets with additional rated responses other than just a golden response by human annotation. Experiments on our three datasets and two existing benchmarks demonstrate the effectiveness of our method, where Pearson, Spearman, and Kendall correlations with human evaluation outperform state-of-the-art baselines.

pdf bib
Constructivist Tokenization for English
Allison Fan | Weiwei Sun
Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023)

This paper revisits tokenization from a theoretical perspective, and argues for the necessity of a constructivist approach to tokenization for semantic parsing and modeling language acquisition. We consider two problems: (1) (semi-) automatically converting existing lexicalist annotations, e.g. those of the Penn TreeBank, into constructivist annotations, and (2) automatic tokenization of raw texts. We demonstrate that (1) a heuristic rule-based constructivist tokenizer is able to yield relatively satisfactory accuracy when gold standard Penn TreeBank part-of-speech tags are available, but that some manual annotations are still necessary to obtain gold standard results, and (2) a neural tokenizer is able to provide accurate automatic constructivist tokenization results from raw character sequences. Our research output also includes a set of high-quality morpheme-tokenized corpora, which enable the training of computational models that more closely align with language comprehension and acquisition.

pdf bib
Generative Knowledge Selection for Knowledge-Grounded Dialogues
Weiwei Sun | Pengjie Ren | Zhaochun Ren
Findings of the Association for Computational Linguistics: EACL 2023

Knowledge selection is the key in knowledge-grounded dialogues (KGD), which aims to select an appropriate knowledge snippet to be used in the utterance based on dialogue history. Previous studies mainly employ the classification approach to classify each candidate snippet as “relevant” or “irrelevant” independently. However, such approaches neglect the interactions between snippets, leading to difficulties in inferring the meaning of snippets. Moreover, they lack modeling of the discourse structure of dialogue-knowledge interactions. We propose a simple yet effective generative approach for knowledge selection, called GenKS. GenKS learns to select snippets by generating their identifiers with a sequence-to-sequence model. GenKS therefore captures intra-knowledge interaction inherently through attention mechanisms. Meanwhile, we devise a hyperlink mechanism to model the dialogue-knowledge interactions explicitly. We conduct experiments on three benchmark datasets, and verify GenKS achieves the best results on both knowledge selection and response generation.

pdf bib
DiQAD: A Benchmark Dataset for Open-domain Dialogue Quality Assessment
Yukun Zhao | Lingyong Yan | Weiwei Sun | Chong Meng | Shuaiqiang Wang | Zhicong Cheng | Zhaochun Ren | Dawei Yin
Findings of the Association for Computational Linguistics: EMNLP 2023

Dialogue assessment plays a critical role in the development of open-domain dialogue systems. Existing work are uncapable of providing an end-to-end and human-epistemic assessment dataset, while they only provide sub-metrics like coherence or the dialogues are conversed between annotators far from real user settings. In this paper, we release a large-scale dialogue quality assessment dataset (DiQAD), for automatically assessing open-domain dialogue quality. Specifically, we (1) establish the assessment criteria based on the dimensions conforming to human judgements on dialogue qualities, and (2) annotate large-scale dialogues that conversed between real users based on these annotation criteria, which contains around 100,000 dialogues. We conduct several experiments and report the performances of the baselines as the benchmark on DiQAD. The dataset is openly accessible at https://github.com/yukunZhao/Dataset_Dialogue_quality_evaluation.

pdf bib
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents
Weiwei Sun | Lingyong Yan | Xinyu Ma | Shuaiqiang Wang | Pengjie Ren | Zhumin Chen | Dawei Yin | Zhaochun Ren
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks, including search engines. However, existing work utilizes the generative ability of LLMs for Information Retrieval (IR) rather than direct passage ranking. The discrepancy between the pre-training objectives of LLMs and the ranking objective poses another challenge. In this paper, we first investigate generative LLMs such as ChatGPT and GPT-4 for relevance ranking in IR. Surprisingly, our experiments reveal that properly instructed LLMs can deliver competitive, even superior results to state-of-the-art supervised methods on popular IR benchmarks. Furthermore, to address concerns about data contamination of LLMs, we collect a new test set called NovelEval, based on the latest knowledge and aiming to verify the model’s ability to rank unknown knowledge. Finally, to improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models using a permutation distillation scheme. Our evaluation results turn out that a distilled 440M model outperforms a 3B supervised model on the BEIR benchmark. The code to reproduce our results is available at www.github.com/sunnweiwei/RankGPT.

2021

pdf bib
Universal Semantic Tagging for English and Mandarin Chinese
Wenxi Li | Yiyang Hou | Yajie Ye | Li Liang | Weiwei Sun
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Universal Semantic Tagging aims to provide lightweight unified analysis for all languages at the word level. Though the proposed annotation scheme is conceptually promising, the feasibility is only examined in four Indo–European languages. This paper is concerned with extending the annotation scheme to handle Mandarin Chinese and empirically study the plausibility of unifying meaning representations for multiple languages. We discuss a set of language-specific semantic phenomena, propose new annotation specifications and build a richly annotated corpus. The corpus consists of 1100 English–Chinese parallel sentences, where compositional semantic analysis is available for English, and another 1000 Chinese sentences which has enriched syntactic analysis. By means of the new annotations, we also evaluate a series of neural tagging models to gauge how successful semantic tagging can be: accuracies of 92.7% and 94.6% are obtained for Chinese and English respectively. The English tagging performance is remarkably better than the state-of-the-art by 7.7%.

pdf bib
Negation Scope Resolution for Chinese as a Second Language
Mengyu Zhang | Weiqi Wang | Shuqiao Sun | Weiwei Sun
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

This paper studies Negation Scope Resolution (NSR) for Chinese as a Second Language (CSL), which shows many unique characteristics that distinguish itself from “standard” Chinese. We annotate a new moderate-sized corpus that covers two background L1 languages, viz. English and Japanese. We build a neural NSR system, which achieves a new state-of-the-art accuracy on English benchmark data. We leverage this system to gauge how successful NSR for CSL can be. Different native language backgrounds of language learners result in unequal cross-lingual transfer, which has a significant impact on processing second language data. In particular, manual annotation, empirical evaluation and error analysis indicate two non-obvious facts: 1) L2-Chinese, L1-Japanese data are more difficult to analyze and thus annotate than L2-Chinese, L1-English data; 2) computational models trained on L2-Chinese, L1-Japanese data perform better than models trained on L2-Chinese, L1-English data.

pdf bib
Comparing Knowledge-Intensive and Data-Intensive Models for English Resource Semantic Parsing
Junjie Cao | Zi Lin | Weiwei Sun | Xiaojun Wan
Computational Linguistics, Volume 47, Issue 1 - March 2021

In this work, we present a phenomenon-oriented comparative analysis of the two dominant approaches in English Resource Semantic (ERS) parsing: classic, knowledge-intensive and neural, data-intensive models. To reflect state-of-the-art neural NLP technologies, a factorization-based parser is introduced that can produce Elementary Dependency Structures much more accurately than previous data-driven parsers. We conduct a suite of tests for different linguistic phenomena to analyze the grammatical competence of different parsers, where we show that, despite comparable performance overall, knowledge- and data-intensive models produce different types of errors, in a way that can be explained by their theoretical properties. This analysis is beneficial to in-depth evaluation of several representative parsing techniques and leads to new directions for parser development.

2020

pdf bib
Exact yet Efficient Graph Parsing, Bi-directional Locality and the Constructivist Hypothesis
Yajie Ye | Weiwei Sun
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

A key problem in processing graph-based meaning representations is graph parsing, i.e. computing all possible derivations of a given graph according to a (competence) grammar. We demonstrate, for the first time, that exact graph parsing can be efficient for large graphs and with large Hyperedge Replacement Grammars (HRGs). The advance is achieved by exploiting locality as terminal edge-adjacency in HRG rules. In particular, we highlight the importance of 1) a terminal edge-first parsing strategy, 2) a categorization of a subclass of HRG, i.e. what we call Weakly Regular Graph Grammar, and 3) distributing argument-structures to both lexical and phrasal rules.

pdf bib
Parsing into Variable-in-situ Logico-Semantic Graphs
Yufei Chen | Weiwei Sun
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose variable-in-situ logico-semantic graphs to bridge the gap between semantic graph and logical form parsing. The new type of graph-based meaning representation allows us to include analysis for scope-related phenomena, such as quantification, negation and modality, in a way that is consistent with the state-of-the-art underspecification approach. Moreover, the well-formedness of such a graph is clear, since model-theoretic interpretation is available. We demonstrate the effectiveness of this new perspective by developing a new state-of-the-art semantic parser for English Resource Semantics. At the core of this parser is a novel neural graph rewriting system which combines the strengths of Hyperedge Replacement Grammar, a knowledge-intensive model, and Graph Neural Networks, a data-intensive model. Our parser achieves an accuracy of 92.39% in terms of elementary dependency match, which is a 2.88 point improvement over the best data-driven model in the literature. The output of our parser is highly coherent: at least 91% graphs are valid, in that they allow at least one sound scope-resolved logical form.

pdf bib
Semantic Parsing for English as a Second Language
Yuanyuan Zhao | Weiwei Sun | Junjie Cao | Xiaojun Wan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper is concerned with semantic parsing for English as a second language (ESL). Motivated by the theoretical emphasis on the learning challenges that occur at the syntax-semantics interface during second language acquisition, we formulate the task based on the divergence between literal and intended meanings. We combine the complementary strengths of English Resource Grammar, a linguistically-precise hand-crafted deep grammar, and TLE, an existing manually annotated ESL UD-TreeBank with a novel reranking model. Experiments demonstrate that in comparison to human annotations, our method can obtain a very promising SemBanking quality. By means of the newly created corpus, we evaluate state-of-the-art semantic parsing as well as grammatical error correction models. The evaluation profiles the performance of neural NLP techniques for handling ESL data and suggests some research directions.

pdf bib
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies
Gosse Bouma | Yuji Matsumoto | Stephan Oepen | Kenji Sagae | Djamé Seddah | Weiwei Sun | Anders Søgaard | Reut Tsarfaty | Dan Zeman
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

pdf bib
Coding Textual Inputs Boosts the Accuracy of Neural Networks
Abdul Rafae Khan | Jia Xu | Weiwei Sun
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Natural Language Processing (NLP) tasks are usually performed word by word on textual inputs. We can use arbitrary symbols to represent the linguistic meaning of a word and use these symbols as inputs. As “alternatives” to a text representation, we introduce Soundex, MetaPhone, NYSIIS, logogram to NLP, and develop fixed-output-length coding and its extension using Huffman coding. Each of those codings combines different character/digital sequences and constructs a new vocabulary based on codewords. We find that the integration of those codewords with text provides more reliable inputs to Neural-Network-based NLP systems through redundancy than text-alone inputs. Experiments demonstrate that our approach outperforms the state-of-the-art models on the application of machine translation, language modeling, and part-of-speech tagging. The source code is available at https://github.com/abdulrafae/coding_nmt.

2019

pdf bib
CUNY-PKU Parser at SemEval-2019 Task 1: Cross-Lingual Semantic Parsing with UCCA
Weimin Lyu | Sheng Huang | Abdul Rafae Khan | Shengqiang Zhang | Weiwei Sun | Jia Xu
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the systems of the CUNY-PKU team in SemEval 2019 Task 1: Cross-lingual Semantic Parsing with UCCA. We introduce a novel model by applying a cascaded MLP and BiLSTM model. Then, we ensemble multiple system-outputs by reparsing. In particular, we introduce a new decoding algorithm for building the UCCA representation. Our system won the first place in one track (French-20K-Open), second places in four tracks (English-Wiki-Open, English-20K-Open, German-20K-Open, and German-20K-Closed), and third place in one track (English-20K-Closed), among all seven tracks.

pdf bib
Graph-Based Meaning Representations: Design and Processing
Alexander Koller | Stephan Oepen | Weiwei Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

This tutorial is on representing and processing sentence meaning in the form of labeled directed graphs. The tutorial will (a) briefly review relevant background in formal and linguistic semantics; (b) semi-formally define a unified abstract view on different flavors of semantic graphs and associated terminology; (c) survey common frameworks for graph-based meaning representation and available graph banks; and (d) offer a technical overview of a representative selection of different parsing approaches.

pdf bib
Peking at MRP 2019: Factorization- and Composition-Based Parsing for Elementary Dependency Structures
Yufei Chen | Yajie Ye | Weiwei Sun
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

We design, implement and evaluate two semantic parsers, which represent factorization- and composition-based approaches respectively, for Elementary Dependency Structures (EDS) at the CoNLL 2019 Shared Task on Cross-Framework Meaning Representation Parsing. The detailed evaluation of the two parsers gives us a new perception about parsing into linguistically enriched meaning representations: current neural EDS parsers are able to reach an accuracy at the inter-annotator agreement level in the same-epoch-and-domain setup.

pdf bib
Parsing Chinese Sentences with Grammatical Relations
Weiwei Sun | Yufei Chen | Xiaojun Wan | Meichun Liu
Computational Linguistics, Volume 45, Issue 1 - March 2019

We report our work on building linguistic resources and data-driven parsers in the grammatical relation (GR) analysis for Mandarin Chinese. Chinese, as an analytic language, encodes grammatical information in a highly configurational rather than morphological way. Accordingly, it is possible and reasonable to represent almost all grammatical relations as bilexical dependencies. In this work, we propose to represent grammatical information using general directed dependency graphs. Both only-local and rich long-distance dependencies are explicitly represented. To create high-quality annotations, we take advantage of an existing TreeBank, namely, Chinese TreeBank (CTB), which is grounded on the Government and Binding theory. We define a set of linguistic rules to explore CTB’s implicit phrase structural information and build deep dependency graphs. The reliability of this linguistically motivated GR extraction procedure is highlighted by manual evaluation. Based on the converted corpus, data-driven, including graph- and transition-based, models are explored for Chinese GR parsing. For graph-based parsing, a new perspective, graph merging, is proposed for building flexible dependency graphs: constructing complex graphs via constructing simple subgraphs. Two key problems are discussed in this perspective: (1) how to decompose a complex graph into simple subgraphs, and (2) how to combine subgraphs into a coherent complex graph. For transition-based parsing, we introduce a neural parser based on a list-based transition system. We also discuss several other key problems, including dynamic oracle and beam search for neural transition-based parsing. Evaluation gauges how successful GR parsing for Chinese can be by applying data-driven models. The empirical analysis suggests several directions for future study.

2018

pdf bib
Neural Maximum Subgraph Parsing for Cross-Domain Semantic Dependency Analysis
Yufei Chen | Sheng Huang | Fang Wang | Junjie Cao | Weiwei Sun | Xiaojun Wan
Proceedings of the 22nd Conference on Computational Natural Language Learning

We present experiments for cross-domain semantic dependency analysis with a neural Maximum Subgraph parser. Our parser targets 1-endpoint-crossing, pagenumber-2 graphs which are a good fit to semantic dependency graphs, and utilizes an efficient dynamic programming algorithm for decoding. For disambiguation, the parser associates words with BiLSTM vectors and utilizes these vectors to assign scores to candidate dependencies. We conduct experiments on the data sets from SemEval 2015 as well as Chinese CCGBank. Our parser achieves very competitive results for both English and Chinese. To improve the parsing performance on cross-domain texts, we propose a data-oriented method to explore the linguistic generality encoded in English Resource Grammar, which is a precisionoriented, hand-crafted HPSG grammar, in an implicit way. Experiments demonstrate the effectiveness of our data-oriented method across a wide range of conditions.

pdf bib
Accurate SHRG-Based Semantic Parsing
Yufei Chen | Weiwei Sun | Xiaojun Wan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We demonstrate that an SHRG-based parser can produce semantic graphs much more accurately than previously shown, by relating synchronous production rules to the syntacto-semantic composition process. Our parser achieves an accuracy of 90.35 for EDS (89.51 for DMRS) in terms of elementary dependency match, which is a 4.87 (5.45) point improvement over the best existing data-driven model, indicating, in our view, the importance of linguistically-informed derivation for data-driven semantic parsing. This accuracy is equivalent to that of English Resource Grammar guided models, suggesting that (recurrent) neural network models are able to effectively learn deep linguistic knowledge from annotations.

pdf bib
Language Generation via DAG Transduction
Yajie Ye | Weiwei Sun | Xiaojun Wan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A DAG automaton is a formal device for manipulating graphs. By augmenting a DAG automaton with transduction rules, a DAG transducer has potential applications in fundamental NLP tasks. In this paper, we propose a novel DAG transducer to perform graph-to-program transformation. The target structure of our transducer is a program licensed by a declarative programming language rather than linguistic structures. By executing such a program, we can easily get a surface string. Our transducer is designed especially for natural language generation (NLG) from type-logical semantic graphs. Taking Elementary Dependency Structures, a format of English Resource Semantics, as input, our NLG system achieves a BLEU-4 score of 68.07. This remarkable result demonstrates the feasibility of applying a DAG transducer to resolve NLG, as well as the effectiveness of our design.

pdf bib
Pre- and In-Parsing Models for Neural Empty Category Detection
Yufei Chen | Yuanyuan Zhao | Weiwei Sun | Xiaojun Wan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Motivated by the positive impact of empty category on syntactic parsing, we study neural models for pre- and in-parsing detection of empty category, which has not previously been investigated. We find several non-obvious facts: (a) BiLSTM can capture non-local contextual information which is essential for detecting empty categories, (b) even with a BiLSTM, syntactic information is still able to enhance the detection, and (c) automatic detection of empty categories improves parsing quality for overt words. Our neural ECD models outperform the prior state-of-the-art by significant margins.

pdf bib
Semantic Role Labeling for Learner Chinese: the Importance of Syntactic Parsing and L2-L1 Parallel Data
Zi Lin | Yuguang Duan | Yuanyuan Zhao | Weiwei Sun | Xiaojun Wan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper studies semantic parsing for interlanguage (L2), taking semantic role labeling (SRL) as a case task and learner Chinese as a case language. We first manually annotate the semantic roles for a set of learner texts to derive a gold standard for automatic SRL. Based on the new data, we then evaluate three off-the-shelf SRL systems, i.e., the PCFGLA-parser-based, neural-parser-based and neural-syntax-agnostic systems, to gauge how successful SRL for learner Chinese can be. We find two non-obvious facts: 1) the L1-sentence-trained systems performs rather badly on the L2 data; 2) the performance drop from the L1 data to the L2 data of the two parser-based systems is much smaller, indicating the importance of syntactic parsing in SRL for interlanguages. Finally, the paper introduces a new agreement-based model to explore the semantic coherency information in the large-scale L2-L1 parallel data. We then show such information is very effective to enhance SRL for learner texts. Our model achieves an F-score of 72.06, which is a 2.02 point improvement over the best baseline.

2017

pdf bib
Parsing for Grammatical Relations via Graph Merging
Weiwei Sun | Yantao Du | Xiaojun Wan
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper is concerned with building deep grammatical relation (GR) analysis using data-driven approach. To deal with this problem, we propose graph merging, a new perspective, for building flexible dependency graphs: Constructing complex graphs via constructing simple subgraphs. We discuss two key problems in this perspective: (1) how to decompose a complex graph into simple subgraphs, and (2) how to combine subgraphs into a coherent complex graph. Experiments demonstrate the effectiveness of graph merging. Our parser reaches state-of-the-art performance and is significantly better than two transition-based parsers.

pdf bib
The Covert Helps Parse the Overt
Xun Zhang | Weiwei Sun | Xiaojun Wan
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper is concerned with whether deep syntactic information can help surface parsing, with a particular focus on empty categories. We design new algorithms to produce dependency trees in which empty elements are allowed, and evaluate the impact of information about empty category on parsing overt elements. Such information is helpful to reduce the approximation error in a structured parsing model, but increases the search space for inference and accordingly the estimation error. To deal with structure-based overfitting, we propose to integrate disambiguation models with and without empty elements, and perform structure regularization via joint decoding. Experiments on English and Chinese TreeBanks with different parsing models indicate that incorporating empty elements consistently improves surface parsing.

pdf bib
Quasi-Second-Order Parsing for 1-Endpoint-Crossing, Pagenumber-2 Graphs
Junjie Cao | Sheng Huang | Weiwei Sun | Xiaojun Wan
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a new Maximum Subgraph algorithm for first-order parsing to 1-endpoint-crossing, pagenumber-2 graphs. Our algorithm has two characteristics: (1) it separates the construction for noncrossing edges and crossing edges; (2) in a single construction step, whether to create a new arc is deterministic. These two characteristics make our algorithm relatively easy to be extended to incorporiate crossing-sensitive second-order features. We then introduce a new algorithm for quasi-second-order parsing. Experiments demonstrate that second-order features are helpful for Maximum Subgraph parsing.

pdf bib
Semantic Dependency Parsing via Book Embedding
Weiwei Sun | Junjie Cao | Xiaojun Wan
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We model a dependency graph as a book, a particular kind of topological space, for semantic dependency parsing. The spine of the book is made up of a sequence of words, and each page contains a subset of noncrossing arcs. To build a semantic graph for a given sentence, we design new Maximum Subgraph algorithms to generate noncrossing graphs on each page, and a Lagrangian Relaxation-based algorithm tocombine pages into a book. Experiments demonstrate the effectiveness of the bookembedding framework across a wide range of conditions. Our parser obtains comparable results with a state-of-the-art transition-based parser.

pdf bib
Parsing to 1-Endpoint-Crossing, Pagenumber-2 Graphs
Junjie Cao | Sheng Huang | Weiwei Sun | Xiaojun Wan
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study the Maximum Subgraph problem in deep dependency parsing. We consider two restrictions to deep dependency graphs: (a) 1-endpoint-crossing and (b) pagenumber-2. Our main contribution is an exact algorithm that obtains maximum subgraphs satisfying both restrictions simultaneously in time O(n5). Moreover, ignoring one linguistically-rare structure descreases the complexity to O(n4). We also extend our quartic-time algorithm into a practical parser with a discriminative disambiguation model and evaluate its performance on four linguistic data sets used in semantic dependency parsing.

2016

pdf bib
Transition-Based Parsing for Deep Dependency Structures
Xun Zhang | Yantao Du | Weiwei Sun | Xiaojun Wan
Computational Linguistics, Volume 42, Issue 3 - September 2016

pdf bib
Towards Accurate and Efficient Chinese Part-of-Speech Tagging
Weiwei Sun | Xiaojun Wan
Computational Linguistics, Volume 42, Issue 3 - September 2016

2015

pdf bib
Peking: Building Semantic Dependency Graphs with a Hybrid Parser
Yantao Du | Fan Zhang | Xun Zhang | Weiwei Sun | Xiaojun Wan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
A Data-Driven, Factorization Parser for CCG Dependency Structures
Yantao Du | Weiwei Sun | Xiaojun Wan
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing
Yantao Du | Fan Zhang | Weiwei Sun | Xiaojun Wan
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Grammatical Relations in Chinese: GB-Ground Extraction and Data-Driven Parsing
Weiwei Sun | Yantao Du | Xin Kou | Shuoyang Ding | Xiaojun Wan
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Data-driven, PCFG-based and Pseudo-PCFG-based Models for Chinese Dependency Parsing
Weiwei Sun | Xiaojun Wan
Transactions of the Association for Computational Linguistics, Volume 1

We present a comparative study of transition-, graph- and PCFG-based models aimed at illuminating more precisely the likely contribution of CFGs in improving Chinese dependency parsing accuracy, especially by combining heterogeneous models. Inspired by the impact of a constituency grammar on dependency parsing, we propose several strategies to acquire pseudo CFGs only from dependency annotations. Compared to linguistic grammars learned from rich phrase-structure treebanks, well designed pseudo grammars achieve similar parsing accuracy and have equivalent contributions to parser ensemble. Moreover, pseudo grammars increase the diversity of base models; therefore, together with all other models, further improve system combination. Based on automatic POS tagging, our final model achieves a UAS of 87.23%, resulting in a significant improvement of the state of the art.

pdf bib
Capturing Long-distance Dependencies in Sequence Models: A Case Study of Chinese Part-of-speech Tagging
Weiwei Sun | Xiaochang Peng | Xiaojun Wan
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
Weiwei Sun | Xiaojun Wan
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
Weiwei Sun | Hans Uszkoreit
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Semantic Cohesion Model for Phrase-Based SMT
Minwei Feng | Weiwei Sun | Hermann Ney
Proceedings of COLING 2012

2011

pdf bib
A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Weiwei Sun
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Generating Virtual Parallel Corpus: A Compatibility Centric Method
Jia Xu | Weiwei Sun
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Enhancing Chinese Word Segmentation Using Unlabeled Data
Weiwei Sun | Jia Xu
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Semantics-Driven Shallow Parsing for Chinese Semantic Role Labeling
Weiwei Sun
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Improving Chinese Semantic Role Labeling with Rich Syntactic Features
Weiwei Sun
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Discriminative Parse Reranking for Chinese with Homogeneous and Heterogeneous Annotations
Weiwei Sun | Rui Wang | Yi Zhang
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Word-based and Character-based Word Segmentation Models: Comparison and Combination
Weiwei Sun
Coling 2010: Posters

pdf bib
Automatic Acquisition of Chinese Novel Noun Compounds
Meng Wang | Chu-Ren Huang | Shiwen Yu | Weiwei Sun
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Automatic acquisition of novel compounds is notoriously difficult because most novel compounds have relatively low frequency in a corpus. The current study proposes a new method to deal with the novel compound acquisition challenge. We model this task as a two-class classification problem in which a candidate compound is either classified as a compound or a non-compound. A machine learning method using SVM, incorporating two types of linguistically motivated features: semantic features and character features, is applied to identify rare but valid noun compounds. We explore two kinds of training data: one is virtual training data which is obtained by three statistical scores, i.e. co-occurrence frequency, mutual information and dependent ratio, from the frequent compounds; the other is real training data which is randomly selected from the infrequent compounds. We conduct comparative experiments, and the experimental results show that even with limited direct evidence in the corpus for the novel compounds, we can make full use of the typical frequent compounds to help in the discovery of the novel compounds.

2009

pdf bib
Chinese Semantic Role Labeling with Shallow Parsing
Weiwei Sun | Zhifang Sui | Meng Wang | Xin Wang
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Chinese Function Tag Labeling
Weiwei Sun | Zhifang Sui
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
Prediction of Thematic Rank for Structured Semantic Role Labeling
Weiwei Sun | Zhifang Sui | Meng Wang
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
The Integration of Dependency Relation Classification and Semantic Role Labeling Using Bilayer Maximum Entropy Markov Models
Weiwei Sun | Hongzhan Li | Zhifang Sui
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib
Prediction of Maximal Projection for Semantic Role Labeling
Weiwei Sun | Zhifang Sui | Haifeng Wang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)