Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e.g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability. In this paper, we aim to improve word embeddings by 1) incorporating more contextual information from existing pre-trained models into the Skip-gram framework, which we call Context-to-Vec; 2) proposing a post-processing retrofitting method for static embeddings independent of training by employing priori synonym knowledge and weighted vector distribution. Through extrinsic and intrinsic tasks, our methods are well proven to outperform the baselines by a large margin.
One of the main bottlenecks in developing discourse dependency parsers is the lack of annotated training data. A potential solution is to utilize abundant unlabeled data by using unsupervised techniques, but there is so far little research in unsupervised discourse dependency parsing. Fortunately, unsupervised syntactic dependency parsing has been studied by decades, which could potentially be adapted for discourse parsing. In this paper, we propose a simple yet effective method to adapt unsupervised syntactic dependency parsing methodology for unsupervised discourse dependency parsing. We apply the method to adapt two state-of-the-art unsupervised syntactic dependency parsing methods. Experimental results demonstrate that our adaptation is effective. Moreover, we extend the adapted methods to the semi-supervised and supervised setting and surprisingly, we find that they outperform previous methods specially designed for supervised discourse parsing. Further analysis shows our adaptations result in superiority not only in parsing accuracy but also in time and space efficiency.
Mannual annotation for dependency parsing is both labourious and time costly, resulting in the difficulty to learn practical dependency parsers for many languages due to the lack of labelled training corpora. To compensate for the scarcity of labelled data, semi-supervised dependency parsing methods are developed to utilize unlabelled data in the training procedure of dependency parsers. In previous work, the autoencoder framework is a prevalent approach for the utilization of unlabelled data. In this framework, training sentences are reconstructed from a decoder conditioned on dependency trees predicted by an encoder. The tree structure requirement brings challenges for both the encoder and the decoder. Sophisticated techniques are employed to tackle these challenges at the expense of model complexity and approximations in encoding and decoding. In this paper, we propose a model based on the variational autoencoder framework. By relaxing the tree constraint in both the encoder and the decoder during training, we make the learning of our model fully arc-factored and thus circumvent the challenges brought by the tree constraint. We evaluate our model on datasets across several languages and the results demonstrate the advantage of our model over previous approaches in both parsing accuracy and speed.
The key to multilingual grammar induction is to couple grammar parameters of different languages together by exploiting the similarity between languages. Previous work relies on linguistic phylogenetic knowledge to specify similarity between languages. In this work, we propose a novel universal grammar induction approach that represents language identities with continuous vectors and employs a neural network to predict grammar parameters based on the representation. Without any prior linguistic phylogenetic knowledge, we automatically capture similarity between languages with the vector representations and softly tie the grammar parameters of different languages. In our experiments, we apply our approach to 15 languages across 8 language families and subfamilies in the Universal Dependency Treebank dataset, and we observe substantial performance gain on average over monolingual and multilingual baselines.