Zipeng Chen
2021
A Span-based Dynamic Local Attention Model for Sequential Sentence Classification
Xichen Shang
|
Qianli Ma
|
Zhenxi Lin
|
Jiangyue Yan
|
Zipeng Chen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Sequential sentence classification aims to classify each sentence in the document based on the context in which sentences appear. Most existing work addresses this problem using a hierarchical sequence labeling network. However, they ignore considering the latent segment structure of the document, in which contiguous sentences often have coherent semantics. In this paper, we proposed a span-based dynamic local attention model that could explicitly capture the structural information by the proposed supervised dynamic local attention. We further introduce an auxiliary task called span-based classification to explore the span-level representations. Extensive experiments show that our model achieves better or competitive performance against state-of-the-art baselines on two benchmark datasets.
2020
MODE-LSTM: A Parameter-efficient Recurrent Network with Multi-Scale for Sentence Classification
Qianli Ma
|
Zhenxi Lin
|
Jiangyue Yan
|
Zipeng Chen
|
Liuhong Yu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
The central problem of sentence classification is to extract multi-scale n-gram features for understanding the semantic meaning of sentences. Most existing models tackle this problem by stacking CNN and RNN models, which easily leads to feature redundancy and overfitting because of relatively limited datasets. In this paper, we propose a simple yet effective model called Multi-scale Orthogonal inDependEnt LSTM (MODE-LSTM), which not only has effective parameters and good generalization ability, but also considers multiscale n-gram features. We disentangle the hidden state of the LSTM into several independently updated small hidden states and apply an orthogonal constraint on their recurrent matrices. We then equip this structure with sliding windows of different sizes for extracting multi-scale n-gram features. Extensive experiments demonstrate that our model achieves better or competitive performance against state-of-the-art baselines on eight benchmark datasets. We also combine our model with BERT to further boost the generalization performance.