Florian Schmidt
2020
How does BERT capture semantics? A closer look at polysemous words
David Yenicelik
|
Florian Schmidt
|
Yannic Kilcher
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
The recent paradigm shift to contextual word embeddings has seen tremendous success across a wide range of down-stream tasks. However, little is known on how the emergent relation of context and semantics manifests geometrically. We investigate polysemous words as one particularly prominent instance of semantic organization. Our rigorous quantitative analysis of linear separability and cluster organization in embedding vectors produced by BERT shows that semantics do not surface as isolated clusters but form seamless structures, tightly coupled with sentiment and syntax.
2019
Autoregressive Text Generation Beyond Feedback Loops
Florian Schmidt
|
Stephan Mandt
|
Thomas Hofmann
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Autoregressive state transitions, where predictions are conditioned on past predictions, are the predominant choice for both deterministic and stochastic sequential models. However, autoregressive feedback exposes the evolution of the hidden state trajectory to potential biases from well-known train-test discrepancies. In this paper, we combine a latent state space model with a CRF observation model. We argue that such autoregressive observation models form an interesting middle ground that expresses local correlations on the word level but keeps the state evolution non-autoregressive. On unconditional sentence generation we show performance improvements compared to RNN and GAN baselines while avoiding some prototypical failure modes of autoregressive models.
Generalization in Generation: A closer look at Exposure Bias
Florian Schmidt
Proceedings of the 3rd Workshop on Neural Generation and Translation
Exposure bias refers to the train-test discrepancy that seemingly arises when an autoregressive generative model uses only ground-truth contexts at training time but generated ones at test time. We separate the contribution of the learning framework and the model to clarify the debate on consequences and review proposed counter-measures. In this light, we argue that generalization is the underlying property to address and propose unconditional generation as its fundamental benchmark. Finally, we combine latent variable modeling with a recent formulation of exploration in reinforcement learning to obtain a rigorous handling of true and generated contexts. Results on language modeling and variational sentence auto-encoding confirm the model’s generalization capability.
Search