2017
pdf
bib
abs
Deep Pyramid Convolutional Neural Networks for Text Categorization
Rie Johnson
|
Tong Zhang
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper proposes a low-complexity word-level deep convolutional neural network (CNN) architecture for text categorization that can efficiently represent long-range associations in text. In the literature, several deep and complex neural networks have been proposed for this task, assuming availability of relatively large amounts of training data. However, the associated computational complexity increases as the networks go deeper, which poses serious challenges in practical applications. Moreover, it was shown recently that shallow word-level CNNs are more accurate and much faster than the state-of-the-art very deep nets such as character-level CNNs even in the setting of large training data. Motivated by these findings, we carefully studied deepening of word-level CNNs to capture global representations of text, and found a simple network architecture with which the best accuracy can be obtained by increasing the network depth without increasing computational cost by much. We call it deep pyramid CNN. The proposed model with 15 weight layers outperforms the previous best models on six benchmark datasets for sentiment classification and topic categorization.
2015
pdf
bib
Effective Use of Word Order for Text Categorization with Convolutional Neural Networks
Rie Johnson
|
Tong Zhang
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2006
pdf
bib
abs
Analysis of TimeBank as a Resource for TimeML Parsing
Branimir Boguraev
|
Rie Kubota Ando
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In our work, we present an analysis of the TimeBank corpus---the only available reference sample of TimeML-compliant annotation---from the point of view of its utility as a training resource for developing automated TimeML annotators. We are encouraged by experimental results indicative of the potential of TimeBank; at the same time, closer inspection of causes for some systematic errors shows off certain deficiencies in the corpus, primarily to do with small size and inconsistent annotation. Our analysis suggests that even a reference resource, developed outside of a rigorous process of training corpus design and creation, can be extremely valuable for training and development purposes. The analysis also highlights areas of correction and improvement for evolving the current reference corpus into a community infrastructure resource.
pdf
bib
Applying Alternating Structure Optimization to Word Sense Disambiguation
Rie Kubota Ando
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)
2005
pdf
bib
A High-Performance Semi-Supervised Learning Method for Text Chunking
Rie Ando
|
Tong Zhang
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)
2004
pdf
bib
Exploiting Unannotated Corpora for Tagging and Chunking
Rie Kubota Ando
Proceedings of the ACL Interactive Poster and Demonstration Sessions
pdf
bib
Semantic Lexicon Construction: Learning from Unlabeled Data via Spectral Analysis
Rie Kubota Ando
Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004
2000
pdf
bib
Mostly-Unsupervised Statistical Segmentation of Japanese: Applications to Kanji
Rie Kubota Ando
|
Lillian Lee
1st Meeting of the North American Chapter of the Association for Computational Linguistics
pdf
bib
Multi-document Summarization by Visualizing Topical Content
Rie Kubota Ando
|
Branimir K. Boguraev
|
Roy J. Byrd
|
Mary S. Neff
NAACL-ANLP 2000 Workshop: Automatic Summarization