This paper aims to provide an unsupervised modelling approach that allows for a more flexible representation of text embeddings. It jointly encodes the words and the paragraphs as individual matrices of arbitrary column dimension with unit Frobenius norm. The representation is also linguistically motivated with the introduction of a metric for the ambient space in which we train the embeddings that calculates the similarity between matrices of unequal number of columns. Thus, the proposed modelling and the novel similarity metric exploits the matrix structure of embeddings. We then go on to show that the same matrices can be reshaped into vectors of unit norm and transform our problem into an optimization problem in a spherical manifold for optimization simplicity. Given the total number of matrices we are dealing with, which is equal to the vocab size plus the total number of documents in the corpus, this makes the training of an otherwise expensive non-linear model extremely efficient. We also quantitatively verify the quality of our text embeddings by showing that they demonstrate improved results in document classification, document clustering and semantic textual similarity benchmark tests.
We propose a geometric framework for learning meta-embeddings of words from different embedding sources. Our framework transforms the embeddings into a common latent space, where, for example, simple averaging or concatenation of different embeddings (of a given word) is more amenable. The proposed latent space arises from two particular geometric transformations - source embedding specific orthogonal rotations and a common Mahalanobis metric scaling. Empirical results on several word similarity and word analogy benchmarks illustrate the efficacy of the proposed framework.
We propose a novel manifold based geometric approach for learning unsupervised alignment of word embeddings between the source and the target languages. Our approach formulates the alignment learning problem as a domain adaptation problem over the manifold of doubly stochastic matrices. This viewpoint arises from the aim to align the second order information of the two language spaces. The rich geometry of the doubly stochastic manifold allows to employ efficient Riemannian conjugate gradient algorithm for the proposed formulation. Empirically, the proposed approach outperforms state-of-the-art optimal transport based approach on the bilingual lexicon induction task across several language pairs. The performance improvement is more significant for distant language pairs.
Recent progress on unsupervised cross-lingual embeddings in the bilingual setting has given the impetus to learning a shared embedding space for several languages. A popular framework to solve the latter problem is to solve the following two sub-problems jointly: 1) learning unsupervised word alignment between several language pairs, and 2) learning how to map the monolingual embeddings of every language to shared multilingual space. In contrast, we propose a simple approach by decoupling the above two sub-problems and solving them separately, one after another, using existing techniques. We show that this proposed approach obtains surprisingly good performance in tasks such as bilingual lexicon induction, cross-lingual word similarity, multilingual document classification, and multilingual dependency parsing. When distant languages are involved, the proposed approach shows robust behavior and outperforms existing unsupervised multilingual word embedding approaches.
We propose a novel geometric approach for learning bilingual mappings given monolingual embeddings and a bilingual dictionary. Our approach decouples the source-to-target language transformation into (a) language-specific rotations on the original embeddings to align them in a common, latent space, and (b) a language-independent similarity metric in this common space to better model the similarity between the embeddings. Overall, we pose the bilingual mapping problem as a classification problem on smooth Riemannian manifolds. Empirically, our approach outperforms previous approaches on the bilingual lexicon induction and cross-lingual word similarity tasks. We next generalize our framework to represent multiple languages in a common latent space. Language-specific rotations for all the languages and a common similarity metric in the latent space are learned jointly from bilingual dictionaries for multiple language pairs. We illustrate the effectiveness of joint learning for multiple languages in an indirect word translation setting.