Bhiksha Raj


pdf bib
Sequential Randomized Smoothing for Adversarially Robust Speech Recognition
Raphael Olivier | Bhiksha Raj
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

While Automatic Speech Recognition has been shown to be vulnerable to adversarial attacks, defenses against these attacks are still lagging. Existing, naive defenses can be partially broken with an adaptive attack. In classification tasks, the Randomized Smoothing paradigm has been shown to be effective at defending models. However, it is difficult to apply this paradigm to ASR tasks, due to their complexity and the sequential nature of their outputs. Our paper overcomes some of these challenges by leveraging speech-specific tools like enhancement and ROVER voting to design an ASR model that is robust to perturbations. We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.


pdf bib
Automatic In-the-wild Dataset Annotation with Deep Generalized Multiple Instance Learning
Joana Correia | Isabel Trancoso | Bhiksha Raj
Proceedings of the Twelfth Language Resources and Evaluation Conference

The automation of the diagnosis and monitoring of speech affecting diseases in real life situations, such as Depression or Parkinson’s disease, depends on the existence of rich and large datasets that resemble real life conditions, such as those collected from in-the-wild multimedia repositories like YouTube. However, the cost of manually labeling these large datasets can be prohibitive. In this work, we propose to overcome this problem by automating the annotation process, without any requirements for human intervention. We formulate the annotation problem as a Multiple Instance Learning (MIL) problem, and propose a novel solution that is based on end-to-end differentiable neural networks. Our solution has the additional advantage of generalizing the MIL framework to more scenarios where the data is stil organized in bags but does not meet the MIL bag label conditions. We demonstrate the performance of the proposed method in labeling the in-the-Wild Speech Medical (WSM) Corpus, using simple textual cues extracted from videos and their metadata. Furthermore we show what is the contribution of each type of textual cues for the final model performance, as well as study the influence of the size of the bags of instances in determining the difficulty of the learning problem


pdf bib
An Unsupervised Dynamic Bayesian Network Approach to Measuring Speech Style Accommodation
Mahaveer Jain | John McDonough | Gahgene Gweon | Bhiksha Raj | Carolyn Penstein Rosé
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics


pdf bib
A Comparison of Latent Variable Models For Conversation Analysis
Sourish Chaudhuri | Bhiksha Raj
Proceedings of the SIGDIAL 2011 Conference


pdf bib
A Speech-in List-out Approach to Spoken User Interfaces
Vijay Divi | C. Forlines | Jan Van Gemert | Bhiksha Raj | B. Schmidt-Nielsen | Kent Wittenburg | Joseph Woelfel | Fang-Fang Zhang
Proceedings of HLT-NAACL 2004: Short Papers