Neural Models for Documents with Metadata

Dallas Card, Chenhao Tan, Noah A. Smith


Abstract
Most real-world document collections involve various types of metadata, such as author, source, and date, and yet the most commonly-used approaches to modeling text corpora ignore this information. While specialized models have been developed for particular applications, few are widely used in practice, as customization typically requires derivation of a custom inference algorithm. In this paper, we build on recent advances in variational inference methods and propose a general neural framework, based on topic models, to enable flexible incorporation of metadata and allow for rapid exploration of alternative models. Our approach achieves strong performance, with a manageable tradeoff between perplexity, coherence, and sparsity. Finally, we demonstrate the potential of our framework through an exploration of a corpus of articles about US immigration.
Anthology ID:
P18-1189
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2031–2040
Language:
URL:
https://aclanthology.org/P18-1189
DOI:
10.18653/v1/P18-1189
Bibkey:
Cite (ACL):
Dallas Card, Chenhao Tan, and Noah A. Smith. 2018. Neural Models for Documents with Metadata. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2031–2040, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Neural Models for Documents with Metadata (Card et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-1189.pdf
Note:
 P18-1189.Notes.pdf
Presentation:
 P18-1189.Presentation.pdf
Video:
 https://vimeo.com/285805040
Code
 dallascard/scholar +  additional community code
Data
IMDb Movie Reviews