Dependency-based Mixture Language Models

Zhixian Yang; Xiaojun Wan

doi:10.18653/v1/2022.acl-long.535

Dependency-based Mixture Language Models

Abstract

Various models have been proposed to incorporate knowledge of syntactic structures into neural language models. However, previous works have relied heavily on elaborate components for a specific language model, usually recurrent neural network (RNN), which makes themselves unwieldy in practice to fit into other neural language models, such as Transformer and GPT-2. In this paper, we introduce the Dependency-based Mixture Language Models. In detail, we first train neural language models with a novel dependency modeling objective to learn the probability distribution of future dependent tokens given context. We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention. Extensive experiments and human evaluations show that our method can be easily and effectively applied to different neural language models while improving neural text generation on various tasks.

Anthology ID:: 2022.acl-long.535
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7758–7773
Language:
URL:: https://aclanthology.org/2022.acl-long.535
DOI:: 10.18653/v1/2022.acl-long.535
Bibkey:
Cite (ACL):: Zhixian Yang and Xiaojun Wan. 2022. Dependency-based Mixture Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7758–7773, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Dependency-based Mixture Language Models (Yang & Wan, ACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.acl-long.535.pdf
Software:: 2022.acl-long.535.software.zip
Video:: https://aclanthology.org/2022.acl-long.535.mp4
Code: fadedcosine/dependency-guided-neural-text-generation
Data: Penn Treebank, ROCStories

PDF Cite Search Code Software Video