HOTTER: Hierarchical Optimal Topic Transport with Explanatory Context Representations

Sabine Wehnert, Christian Scheel, Simona Szakács-Behling, Maret Nieländer, Patrick Mielke, Ernesto William De Luca


Abstract
Natural language processing (NLP) is often the backbone of today’s systems for user interactions, information retrieval and others. Many of such NLP applications rely on specialized learned representations (e.g. neural word embeddings, topic models) that improve the ability to reason about the relationships between documents of a corpus. Paired with the progress in learned representations, the similarity metrics used to compare representations of documents are also evolving, with numerous proposals differing in computation time or interpretability. In this paper we propose an extension to a specific emerging hybrid document distance metric which combines topic models and word embeddings: the Hierarchical Optimal Topic Transport (HOTT). In specific, we extend HOTT by using context-enhanced word representations. We provide a validation of our approach on public datasets, using the language model BERT for a document categorization task. Results indicate competitive performance of the extended HOTT metric. We furthermore apply the HOTT metric and its extension to support educational media research, with a retrieval task of matching topics in German curricula to educational textbooks passages, along with offering an auxiliary explanatory document representing the dominant topic of the retrieved document. In a user study, our explanation method is preferred over regular topic keywords.
Anthology ID:
2021.findings-emnlp.418
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4856–4866
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.418
DOI:
10.18653/v1/2021.findings-emnlp.418
Bibkey:
Cite (ACL):
Sabine Wehnert, Christian Scheel, Simona Szakács-Behling, Maret Nieländer, Patrick Mielke, and Ernesto William De Luca. 2021. HOTTER: Hierarchical Optimal Topic Transport with Explanatory Context Representations. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4856–4866, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
HOTTER: Hierarchical Optimal Topic Transport with Explanatory Context Representations (Wehnert et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.418.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.418.mp4
Code
 anybass/hotter