Speech Dereverberation for Robust ASR using Deep learning Techniques

K. Sriram; Hemanth S.

Research Article

Speech Dereverberation for Robust ASR using Deep learning Techniques

by K. Sriram, Hemanth S.

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 186 - Issue 46

Published: November 2024

Authors: K. Sriram, Hemanth S.

10.5120/ijca2024924095

PDF

K. Sriram, Hemanth S. . Speech Dereverberation for Robust ASR using Deep learning Techniques. International Journal of Computer Applications. 186, 46 (November 2024), 19-23. DOI=10.5120/ijca2024924095

                        @article{ 10.5120/ijca2024924095,
                        author  = { K. Sriram,Hemanth S. },
                        title   = { Speech Dereverberation for Robust ASR using Deep learning Techniques },
                        journal = { International Journal of Computer Applications },
                        year    = { 2024 },
                        volume  = { 186 },
                        number  = { 46 },
                        pages   = { 19-23 },
                        doi     = { 10.5120/ijca2024924095 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2024
                        %A K. Sriram
                        %A Hemanth S.
                        %T Speech Dereverberation for Robust ASR using Deep learning Techniques%T 
                        %J International Journal of Computer Applications
                        %V 186
                        %N 46
                        %P 19-23
                        %R 10.5120/ijca2024924095
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper aims to provide a comprehensive study on different speech dereverberation techniques using deep learning and compares them to find the best possible solution for the said problem. The persistence of sound after a sound is created is known as reverberation, or reverb in acoustics. A reflection is the result of a sound or signal hitting many surfaces in close proximity. These surfaces might be furniture, people, or even the surrounding air. The reflections build up and eventually disintegrate. The best example of this is when the sound source cuts out but the reflections keep going, amplitude lowering until it reaches zero. Deep learning is basically a three-layer neural network. By simulating human brain function, although not exactly mimicking it, these neural networks enable the human brain to "learn" from vast quantities of data. Additional hidden layers can aid in optimizing and refining for accuracy, even if a neural network with only one layer can still produce rough predictions. Deep learning techniques, including UNet, GANs, and LSTM, are implemented in this paper to study speech dereverberation. Speech reverberation refers to the degradation of the entire signal caused by reflections of the target signal, which diminishes the quality of speech. The objective is to enhance the voice signal by eliminating this reverberation.

References

K. Kinoshita κ.ά., ‘The REVERB Challenge: A Benchmark Task for Reverberation-Robust ASR Techniques’, στο New Era for Robust Speech Recognition, Springer, 2017.
O. Ernst, S. E. Chazan, S. Gannot and J. Goldberger, "Speech Dereverberation Using Fully Convolutional Networks," 2018 26th European Signal Processing Conference (EUSIPCO), 2018, pp. 390-394, doi: 10.23919/EUSIPCO.2018.8553141.
Y. Zhao, Z. Wang and D. Wang, "Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 1, pp. 53-62, 2019. Available: 10.1109/taslp.2018.2870725.
IEEE Transactions on Audio Speech & Language Processing, 2010, 18(7):1717-1731.
Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction[J].
T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi and B. -H. Juang, "Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1717-1731, Sept. 2010, doi: 10.1109/TASL.2010.2052251.

Index Terms

Computer Science

Information Sciences

Deep Learning techniques

GAN

acoustics

speech

dereverberation

Keywords

UNet GAN deverberation