A Study on Using Different Audio Lengths in Transfer Learning for Improving Chainsaw Sound Recognition

Jia-Wei Chang, Zhong-Yun Hu


Abstract
Chainsaw sound recognition is a challenging task because of the complexity of sound and the excessive noises in mountain environments. This study aims to discuss the influence of different sound lengths on the accuracy of model training. Therefore, this study used LeNet, a simple model with few parameters, and adopted the design of average pooling to enable the proposed models to receive audio of any length. In performance comparison, we mainly compared the influence of different audio lengths and further tested the transfer learning from short-to-long and long-to-short audio. In experiments, we used the ESC-10 dataset for training models and validated their performance via the self-collected chainsaw-audio dataset. The experimental results show that (a) the models trained with different audio lengths (1s, 3s, and 5s) have accuracy from 74% 78%, 74% 77%, and 79% 83% on the self-collected dataset. (b) The generalization of the previous models is significantly improved by transfer learning, the models achieved 85.28%, 88.67%, and 91.8% of accuracy. (c) In transfer learning, the model learned from short-to-long audios can achieve better results than that learned from long-to-short audios, especially being differed 14% of accuracy on 5s chainsaw-audios.
Anthology ID:
2022.rocling-1.9
Volume:
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Month:
November
Year:
2022
Address:
Taipei, Taiwan
Editors:
Yung-Chun Chang, Yi-Chin Huang
Venue:
ROCLING
SIG:
Publisher:
The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Note:
Pages:
67–74
Language:
Chinese
URL:
https://aclanthology.org/2022.rocling-1.9
DOI:
Bibkey:
Cite (ACL):
Jia-Wei Chang and Zhong-Yun Hu. 2022. A Study on Using Different Audio Lengths in Transfer Learning for Improving Chainsaw Sound Recognition. In Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), pages 67–74, Taipei, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
Cite (Informal):
A Study on Using Different Audio Lengths in Transfer Learning for Improving Chainsaw Sound Recognition (Chang & Hu, ROCLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.rocling-1.9.pdf
Data
ESC-50