Jan Trmal


pdf bib
ConEC: Earnings Call Dataset with Real-world Contexts for Benchmarking Contextual Speech Recognition
Ruizhe Huang | Mahsa Yarmohammadi | Jan Trmal | Jing Liu | Desh Raj | Leibny Paola Garcia | Alexei V. Ivanov | Patrick Ehlen | Mingzhi Yu | Dan Povey | Sanjeev Khudanpur
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Knowing the particular context associated with a conversation can help improving the performance of an automatic speech recognition (ASR) system. For example, if we are provided with a list of in-context words or phrases — such as the speaker’s contacts or recent song playlists — during inference, we can bias the recognition process towards this list. There are many works addressing contextual ASR; however, there is few publicly available real benchmark for evaluation, making it difficult to compare different solutions. To this end, we provide a corpus (“ConEC”) and baselines to evaluate contextual ASR approaches, grounded on real-world applications. The ConEC corpus is based on public-domain earnings calls (ECs) and associated supplementary materials, such as presentation slides, earnings news release as well as a list of meeting participants’ names and affiliations. We demonstrate that such real contexts are noisier than artificially synthesized contexts that contain the ground truth, yet they still make great room for future improvement of contextual ASR technology


pdf bib
Induced Inflection-Set Keyword Search in Speech
Oliver Adams | Matthew Wiesner | Jan Trmal | Garrett Nicolai | David Yarowsky
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

We investigate the problem of searching for a lexeme-set in speech by searching for its inflectional variants. Experimental results indicate how lexeme-set search performance changes with the number of hypothesized inflections, while ablation experiments highlight the relative importance of different components in the lexeme-set search pipeline and the value of using curated inflectional paradigms. We provide a recipe and evaluation set for the community to use as an extrinsic measure of the performance of inflection generation approaches.


pdf bib
New release of Mixer-6: Improved validity for phonetic study of speaker variation and identification
Eleanor Chodroff | Matthew Maciejewski | Jan Trmal | Sanjeev Khudanpur | John Godfrey
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Mixer series of speech corpora were collected over several years, principally to support annual NIST evaluations of speaker recognition (SR) technologies. These evaluations focused on conversational speech over a variety of channels and recording conditions. One of the series, Mixer-6, added a new condition, read speech, to support basic scientific research on speaker characteristics, as well as technology evaluation. With read speech it is possible to make relatively precise measurements of phonetic events and features, which can be correlated with the performance of speaker recognition algorithms, or directly used in phonetic analysis of speaker variability. The read speech, as originally recorded, was adequate for large-scale evaluations (e.g., fixed-text speaker ID algorithms) but only marginally suitable for acoustic-phonetic studies. Numerous errors due largely to speaker behavior remained in the corpus, with no record of their locations or rate of occurrence. We undertook the effort to correct this situation with automatic methods supplemented by human listening and annotation. The present paper describes the tools and methods, resulting corrections, and some examples of the kinds of research studies enabled by these enhancements.


pdf bib
A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation
Gaurav Kumar | Graeme Blackwood | Jan Trmal | Daniel Povey | Sanjeev Khudanpur
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing