Zhihong Lei
2024
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Mingbin Xu
|
Alex Jin
|
Sicheng Wang
|
Mu Su
|
Tim Ng
|
Henry Mason
|
Shiyi Han
|
Zhihong Lei
|
Yaqiao Deng
|
Zhen Huang
|
Mahesh Krishnamoorthy
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
With increasingly more powerful compute capabilities and resources in today’s devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other small home automation devices. In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to fit an advanced Conformer based end-to-end streaming ASR system on resource-constrained devices without accuracy degradation. We achieve over 5.26 times faster than realtime (0.19 RTF) speech recognition on small wearables while minimizing energy consumption and achieving state-of-the-art accuracy. The proposed methods are widely applicable to other transformer-based server-free AI applications. In addition, we provide a complete theory on optimal pre-normalizers that numerically stabilize layer normalization in any Lp-norm using any floating point precision.
2020
Neural Language Modeling for Named Entity Recognition
Zhihong Lei
|
Weiyue Wang
|
Christian Dugast
|
Hermann Ney
Proceedings of the 28th International Conference on Computational Linguistics
Named entity recognition is a key component in various natural language processing systems, and neural architectures provide significant improvements over conventional approaches. Regardless of different word embedding and hidden layer structures of the networks, a conditional random field layer is commonly used for the output. This work proposes to use a neural language model as an alternative to the conditional random field layer, which is more flexible for the size of the corpus. Experimental results show that the proposed system has a significant advantage in terms of training speed, with a marginal performance degradation.
Search
Co-authors
- Weiyue Wang 1
- Christian Dugast 1
- Hermann Ney 1
- Mingbin Xu 1
- Alex Jin 1
- show all...