Padmapriya Mohankumar
2022
IMFinE:An Integrated BERT-CNN-BiGRU Model for Mental Health Detection in Financial Context on Textual Data
Ashraf Kamal
|
Padmapriya Mohankumar
|
Vishal K Singh
Proceedings of the 19th International Conference on Natural Language Processing (ICON)
Nowadays, mental health is a global issue. It is a pervasive phenomenon over online social network platforms. It is observed in varied categories, such as depression, suicide, and stress on the Web. Hence, mental health detection problem is receiving continuous attention among computational linguistics researchers. On the other hand, public emotions and reactions play a significant role in financial domain and the issue of mental health is directly associated. In this paper, we propose a new study to detect mental health in financial context. It starts with two-step data filtration steps to prepare the mental health dataset in financial context. A new model called IMFinE is introduced. It consists of an input layer, followed by two relevant BERT embedding layers, a convolutional neural network, a bidirectional gated recurrent unit, and finally, dense and output layers. The empirical evaluation of the proposed model is performed on Reddit datasets and it shows impressive results in terms of precision, recall, and f-score. It also outperforms relevant state-of-the-art and baseline methods. To the best of our knowledge, this is the first study on mental health detection in financial context.
Methods to Optimize Wav2Vec with Language Model for Automatic Speech Recognition in Resource Constrained Environment
Vaibhav Haswani
|
Padmapriya Mohankumar
Proceedings of the 19th International Conference on Natural Language Processing (ICON)
Automatic Speech Recognition (ASR) on resource constrained environment is a complex task since most of the State-Of-The-Art models are combination of multilayered convolutional neural network (CNN) and Transformer models which itself requires huge resources such as GPU or TPU for training as well as inference. The accuracy as a performance metric of an ASR system depends upon the efficiency of phonemes to word translation of an Acoustic Model and context correction of the Language model. However, inference as a performance metric is also an important aspect, which mostly depends upon the resources. Also, most of the ASR models uses transformer models at its core and one caveat of transformers is that it usually has a finite amount of sequence length it can handle. Either because it uses position encodings or simply because the cost of attention in transformers is actually O(n²) in sequence length, meaning that using very large sequence length explodes in complexity/memory. So you cannot run the system with finite hardware even a very high-end GPU, because if we inference even a one hour long audio with Wav2Vec the system will crash. In this paper, we used some state-of-the-art methods to optimize the Wav2Vec model for better accuracy of predictions in resource constrained systems. In addition, we have performed tests with other SOTA models such as Citrinet and Quartznet for the comparative analysis.