Georgi Shopov
2024
Consistent Bidirectional Language Modelling: Expressive Power and Representational Conciseness
Georgi Shopov
|
Stefan Gerdjikov
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The inability to utilise future contexts and the pre-determined left-to-right generation order are major limitations of unidirectional language models. Bidirectionality has been introduced to address those deficiencies. However, a crucial shortcoming of bidirectional language models is the potential inconsistency of their conditional distributions. This fundamental flaw greatly diminishes their applicability and hinders their capability of tractable sampling and likelihood computation. In this work, we introduce a class of bidirectional language models, called latent language models, that are consistent by definition and can be efficiently used both for generation and scoring of sequences. We define latent language models based on the well-understood formalism of bisequential decompositions from automata theory. This formal correspondence allows us to precisely charaterise the abilities and limitations of a subclass of latent language models, called rational language models. As a result, we obtain that latent language models are exponentially more concise and significantly more expressive than unidirectional language models.
2019
Towards Accurate Text Verbalization for ASR Based on Audio Alignment
Diana Geneva
|
Georgi Shopov
Proceedings of the Student Research Workshop Associated with RANLP 2019
Verbalization of non-lexical linguistic units plays an important role in language modeling for automatic speech recognition systems. Most verbalization methods require valuable resources such as ground truth, large training corpus and expert knowledge which are often unavailable. On the other hand a considerable amount of audio data along with its transcribed text are freely available on the Internet and could be utilized for the task of verbalization. This paper presents a methodology for accurate verbalization of audio transcriptions based on phone-level alignment between the transcriptions and their corresponding audio recordings. Comparing this approach to a more general rule-based verbalization method shows a significant improvement in ASR recognition of non-lexical units. In the process of evaluating this approach we also expose the indirect influence of verbalization accuracy on the quality of acoustic models trained on automatically derived speech corpora.