Zhenisbek Assylbekov


2022

pdf bib
From Hyperbolic Geometry Back to Word Embeddings
Zhenisbek Assylbekov | Sultan Nurmukhamedov | Arsen Sheverdin | Thomas Mach
Proceedings of the 7th Workshop on Representation Learning for NLP

We choose random points in the hyperbolic disc and claim that these points are already word representations. However, it is yet to be uncovered which point corresponds to which word of the human language of interest. This correspondence can be approximately established using a pointwise mutual information between words and recent alignment techniques.

pdf bib
Speeding Up Entmax
Maxat Tezekbayev | Vassilina Nikoulina | Matthias Gallé | Zhenisbek Assylbekov
Findings of the Association for Computational Linguistics: NAACL 2022

Softmax is the de facto standard for normalizing logits in modern neural networks for language processing. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being selected at each generation step, leading to a variety of reported problems in text generation. 𝛼-entmax of Peters et al. (2019) solves this problem, but is unfortunately slower than softmax. In this paper, we propose an alternative to 𝛼-entmax, which keeps its virtuous characteristics, but is as fast as optimized softmax and achieves on par or better performance in machine translation task.

2018

pdf bib
Reusing Weights in Subword-Aware Neural Language Models
Zhenisbek Assylbekov | Rustem Takhanov
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable- and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multi-layer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%-87% fewer parameters.

pdf bib
Reproducing and Regularizing the SCRN Model
Olzhas Kabdolov | Zhenisbek Assylbekov | Rustem Takhanov
Proceedings of the 27th International Conference on Computational Linguistics

We reproduce the Structurally Constrained Recurrent Network (SCRN) model, and then regularize it using the existing widespread techniques, such as naive dropout, variational dropout, and weight tying. We show that when regularized and optimized appropriately the SCRN model can achieve performance comparable with the ubiquitous LSTM model in language modeling task on English data, while outperforming it on non-English data.

pdf bib
Manual vs Automatic Bitext Extraction
Aibek Makazhanov | Bagdat Myrzakhmetov | Zhenisbek Assylbekov
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones
Zhenisbek Assylbekov | Rustem Takhanov | Bagdat Myrzakhmetov | Jonathan N. Washington
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Syllabification does not seem to improve word-level RNN language modeling quality when compared to character-based segmentation. However, our best syllable-aware language model, achieving performance comparable to the competitive character-aware model, has 18%-33% fewer parameters and is trained 1.2-2.2 times faster.

2015

pdf bib
A Statistical Model for Measuring Structural Similarity between Webpages
Zhenisbek Assylbekov | Assulan Nurkas | Inês Russinho Mouga
Proceedings of the International Conference Recent Advances in Natural Language Processing