Nan Yu


pdf bib
RST Discourse Parsing with Second-Stage EDU-Level Pre-training
Nan Yu | Meishan Zhang | Guohong Fu | Min Zhang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Pre-trained language models (PLMs) have shown great potentials in natural language processing (NLP) including rhetorical structure theory (RST) discourse parsing.Current PLMs are obtained by sentence-level pre-training, which is different from the basic processing unit, i.e. element discourse unit (EDU).To this end, we propose a second-stage EDU-level pre-training approach in this work, which presents two novel tasks to learn effective EDU representations continually based on well pre-trained language models.Concretely, the two tasks are (1) next EDU prediction (NEP) and (2) discourse marker prediction (DMP).We take a state-of-the-art transition-based neural parser as baseline, and adopt it with a light bi-gram EDU modification to effectively explore the EDU-level pre-trained EDU representation.Experimental results on a benckmark dataset show that our method is highly effective,leading a 2.1-point improvement in F1-score.All codes and pre-trained models will be released publicly to facilitate future studies.

pdf bib
Speaker-Aware Discourse Parsing on Multi-Party Dialogues
Nan Yu | Guohong Fu | Min Zhang
Proceedings of the 29th International Conference on Computational Linguistics

Discourse parsing on multi-party dialogues is an important but difficult task in dialogue systems and conversational analysis. It is believed that speaker interactions are helpful for this task. However, most previous research ignores speaker interactions between different speakers. To this end, we present a speaker-aware model for this task. Concretely, we propose a speaker-context interaction joint encoding (SCIJE) approach, using the interaction features between different speakers. In addition, we propose a second-stage pre-training task, same speaker prediction (SSP), enhancing the conversational context representations by predicting whether two utterances are from the same speaker. Experiments on two standard benchmark datasets show that the proposed model achieves the best-reported performance in the literature. We will release the codes of this paper to facilitate future research.


pdf bib
A Discourse-Aware Graph Neural Network for Emotion Recognition in Multi-Party Conversation
Yang Sun | Nan Yu | Guohong Fu
Findings of the Association for Computational Linguistics: EMNLP 2021

Emotion recognition in multi-party conversation (ERMC) is becoming increasingly popular as an emerging research topic in natural language processing. Prior research focuses on exploring sequential information but ignores the discourse structures of conversations. In this paper, we investigate the importance of discourse structures in handling informative contextual cues and speaker-specific features for ERMC. To this end, we propose a discourse-aware graph neural network (ERMC-DisGCN) for ERMC. In particular, we design a relational convolution to lever the self-speaker dependency of interlocutors to propagate contextual information. Furthermore, we exploit a gated convolution to select more informative cues for ERMC from dependent utterances. The experimental results show our method outperforms multiple baselines, illustrating that discourse structures are of great value to ERMC.

pdf bib
Discontinuous Named Entity Recognition as Maximal Clique Discovery
Yucheng Wang | Bowen Yu | Hongsong Zhu | Tingwen Liu | Nan Yu | Limin Sun
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Named entity recognition (NER) remains challenging when entity mentions can be discontinuous. Existing methods break the recognition process into several sequential steps. In training, they predict conditioned on the golden intermediate results, while at inference relying on the model output of the previous steps, which introduces exposure bias. To solve this problem, we first construct a segment graph for each sentence, in which each node denotes a segment (a continuous entity on its own, or a part of discontinuous entities), and an edge links two nodes that belong to the same entity. The nodes and edges can be generated respectively in one stage with a grid tagging scheme and learned jointly using a novel architecture named Mac. Then discontinuous NER can be reformulated as a non-parametric process of discovering maximal cliques in the graph and concatenating the spans in each clique. Experiments on three benchmarks show that our method outperforms the state-of-the-art (SOTA) results, with up to 3.5 percentage points improvement on F1, and achieves 5x speedup over the SOTA model.


pdf bib
Transition-based Neural RST Parsing with Implicit Syntax Features
Nan Yu | Meishan Zhang | Guohong Fu
Proceedings of the 27th International Conference on Computational Linguistics

Syntax has been a useful source of information for statistical RST discourse parsing. Under the neural setting, a common approach integrates syntax by a recursive neural network (RNN), requiring discrete output trees produced by a supervised syntax parser. In this paper, we propose an implicit syntax feature extraction approach, using hidden-layer vectors extracted from a neural syntax parser. In addition, we propose a simple transition-based model as the baseline, further enhancing it with dynamic oracle. Experiments on the standard dataset show that our baseline model with dynamic oracle is highly competitive. When implicit syntax features are integrated, we are able to obtain further improvements, better than using explicit Tree-RNN.