%0 Conference Proceedings %T PRADO: Projection Attention Networks for Document Classification On-Device %A Kaliamoorthi, Prabhu %A Ravi, Sujith %A Kozareva, Zornitsa %Y Inui, Kentaro %Y Jiang, Jing %Y Ng, Vincent %Y Wan, Xiaojun %S Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) %D 2019 %8 November %I Association for Computational Linguistics %C Hong Kong, China %F kaliamoorthi-etal-2019-prado %X Recently, there has been a great interest in the development of small and accurate neural networks that run entirely on devices such as mobile phones, smart watches and IoT. This enables user privacy, consistent user experience and low latency. Although a wide range of applications have been targeted from wake word detection to short text classification, yet there are no on-device networks for long text classification. We propose a novel projection attention neural network PRADO that combines trainable projections with attention and convolutions. We evaluate our approach on multiple large document text classification tasks. Our results show the effectiveness of the trainable projection model in finding semantically similar phrases and reaching high performance while maintaining compact size. Using this approach, we train tiny neural networks just 200 Kilobytes in size that improve over prior CNN and LSTM models and achieve near state of the art performance on multiple long document classification tasks. We also apply our model for transfer learning, show its robustness and ability to further improve the performance in limited data scenarios. %R 10.18653/v1/D19-1506 %U https://aclanthology.org/D19-1506 %U https://doi.org/10.18653/v1/D19-1506 %P 5012-5021