Named entity recognition (NER) is a fundamental task in natural language processing. Recently, NER has been formulated as a machine reading comprehension (MRC) task, in which manually-crafted queries are used to extract entities of different types. However, current MRC-based NER techniques are limited to extracting a single type of entities at a time and are largely geared towards resource-rich settings. This renders them inefficient during the inference phase, while also leaving their potential untapped for utilization in low-resource settings. We suggest a query-parallel MRC-based approach to address these issues, which is capable of extracting multiple entity types concurrently and is applicable to both resource-rich and resource-limited settings. Specifically, we propose a query-parallel encoder which uses a query-segmented attention mechanism to isolate the semantics of queries and model the query-context interaction with a unidirectional flow. This allows for easier generalization to new entity types or transfer to new domains. After obtaining the query and context representations through the encoder, they are fed into a query-conditioned biaffine predictor to extract multiple entities at once. The model is trained with parameter-efficient tuning technique, making it more data-efficient. We conduct extensive experiments and demonstrate that our model performs competitively against strong baseline methods in resource-rich settings, and achieves state-of-the-art results in low-resource settings, including training-from-scratch, in-domain transfer and cross-domain transfer tasks.
Natural Language Understanding (NLU) is one of the most critical components of task-oriented dialogue, and it is often considered as an intent classification task. To achieve outstanding intent identification performance, system designers often need to hire a large number of domain experts to label the data, which is inefficient and costly. To address this problem, researchers’ attention has gradually shifted to automatic intent clustering methods, which employ low-resource unsupervised approaches to solve classification problems. The classical framework for clustering is deep clustering, which uses deep neural networks (DNNs) to jointly optimize non-clustering loss and clustering loss. However, for new conversational domains or services, utterances required to assign intents are scarce and the performance of DNNs is often dependent on large amounts of data. In addition, although re-clustering with k-means algorithm after training the network usually leads to better results, k-means methods often suffer from poor stability. To address these problems, we propose an effective two-stage progressive approach to refine the clustering. Firstly, we pre-train the network with contrastive loss using all conversations data and then optimize the clustering loss and contrastive loss simultaneously. Secondly, we propose adaptive progressive k-means to alleviate the randomness of vanilla k-means, achieving better performance and smaller deviation. Our method ranks second in DSTC11 Track2 Task 1, a benchmark for intent clustering of task-oriented dialogue, demonstrating the superiority and effectiveness of our method.
Learning high-quality sentence representations is a fundamental problem of natural language processing which could benefit a wide range of downstream tasks. Though the BERT-like pre-trained language models have achieved great success, using their sentence representations directly often results in poor performance on the semantic textual similarity task. Recently, several contrastive learning methods have been proposed for learning sentence representations and have shown promising results. However, most of them focus on the constitution of positive and negative representation pairs and pay little attention to the training objective like NT-Xent, which is not sufficient enough to acquire the discriminating power and is unable to model the partial order of semantics between sentences. So in this paper, we propose a new method ArcCSE, with training objectives designed to enhance the pairwise discriminative power and model the entailment relation of triplet sentences. We conduct extensive experiments which demonstrate that our approach outperforms the previous state-of-the-art on diverse sentence related tasks, including STS and SentEval.
On the WikiSQL benchmark, most methods tackle the challenge of text-to-SQL with predefined sketch slots and build sophisticated sub-tasks to fill these slots. Though achieving promising results, these methods suffer from over-complex model structure. In this paper, we present a simple yet effective approach that enables auto-regressive sequence-to-sequence model to robust text-to-SQL generation. Instead of formulating the task of text-to-SQL as slot-filling, we propose to train sequence-to-sequence model with Schema-aware Denoising (SeaD), which consists of two denoising objectives that train model to either recover input or predict output from two novel erosion and shuffle noises. These model-agnostic denoising objectives act as the auxiliary tasks for structural data modeling during sequence-to-sequence generation. In addition, we propose a clause-sensitive execution guided (EG) decoding strategy to overcome the limitation of EG decoding for generative model. The experiments show that the proposed method improves the performance of sequence-to-sequence model in both schema linking and grammar correctness and establishes new state-of-the-art on WikiSQL benchmark. Our work indicates that the capacity of sequence-to-sequence model for text-to-SQL may have been under-estimated and could be enhanced by specialized denoising task.
Emotion recognition in conversation (ERC) has attracted much attention in recent years for its necessity in widespread applications. With the development of graph neural network (GNN), recent state-of-the-art ERC models mostly use GNN to embed the intrinsic structure information of a conversation into the utterance features. In this paper, we propose a novel GNN-based model for ERC, namely S+PAGE, to better capture the speaker and position-aware conversation structure information. Specifically, we add the relative positional encoding and speaker dependency encoding in the representations of edge weights and edge types respectively to acquire a more reasonable aggregation algorithm for ERC. Besides, a two-stream conversational Transformer is presented to extract both the self and inter-speaker contextual features for each utterance. Extensive experiments are conducted on four ERC benchmarks with state-of-the-art models employed as baselines for comparison, whose results demonstrate the superiority of our model.
This paper presents our submission to task 10, Structured Sentiment Analysis of the SemEval 2022 competition. The task aims to extract all elements of the fine-grained sentiment in a text. We cast structured sentiment analysis to the prediction of the sentiment graphs following (Barnes et al., 2021), where nodes are spans of sentiment holders, targets and expressions, and directed edges denote the relation types between them. Our approach closely follows that of semantic dependency parsing (Dozat and Manning, 2018). The difference is that we use pre-trained language models (e.g., BERT and RoBERTa) as text encoder to solve the problem of limited annotated data. Additionally, we make improvements on the computation of cross attention and present the suffix masking technique to make further performance improvement. Substantially, our model achieved the Top-1 average Sentiment Graph F1 score on seven datasets in five different languages in the monolingual subtask.
Different from other text generation tasks, in product description generation, it is of vital importance to generate faithful descriptions that stick to the product attribute information. However, little attention has been paid to this problem. To bridge this gap we propose a model named Fidelity-oriented Product Description Generator (FPDG). FPDG takes the entity label of each word into account, since the product attribute information is always conveyed by entity words. Specifically, we first propose a Recurrent Neural Network (RNN) decoder based on the Entity-label-guided Long Short-Term Memory (ELSTM) cell, taking both the embedding and the entity label of each word as input. Second, we establish a keyword memory that stores the entity labels as keys and keywords as values, and FPDG will attend to keywords through attending to their entity labels. Experiments conducted a large-scale real-world product description dataset show that our model achieves the state-of-the-art performance in terms of both traditional generation metrics as well as human evaluations. Specifically, FPDG increases the fidelity of the generated descriptions by 25%.