Taiga Ishii


2023

pdf bib
Tree-shape Uncertainty for Analyzing the Inherent Branching Bias of Unsupervised Parsing Models
Taiga Ishii | Yusuke Miyao
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)

This paper presents the formalization of tree-shape uncertainty that enables us to analyze the inherent branching bias of unsupervised parsing models using raw texts alone. Previous work analyzed the branching bias of unsupervised parsing models by comparing the outputs of trained parsers with gold syntactic trees. However, such approaches do not consider the fact that texts can be generated by different grammars with different syntactic trees, possibly failing to clearly separate the inherent bias of the model and the bias in train data learned by the model. To this end, we formulate tree-shape uncertainty and derive sufficient conditions that can be used for creating texts that are expected to contain no biased information on branching. In the experiment, we show that training parsers on such unbiased texts can effectively detect the branching bias of existing unsupervised parsing models. Such bias may depend only on the algorithm, or it may depend on seemingly unrelated dataset statistics such as sequence length and vocabulary size.
Search
Co-authors
Venues