Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure

M. Babul Islam

Research Article

Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure

by M. Babul Islam

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 182 - Issue 45

Published: Mar 2019

Authors: M. Babul Islam

10.5120/ijca2019918600

PDF

M. Babul Islam . Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure. International Journal of Computer Applications. 182, 45 (Mar 2019), 1-4. DOI=10.5120/ijca2019918600

                        @article{ 10.5120/ijca2019918600,
                        author  = { M. Babul Islam },
                        title   = { Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure },
                        journal = { International Journal of Computer Applications },
                        year    = { 2019 },
                        volume  = { 182 },
                        number  = { 45 },
                        pages   = { 1-4 },
                        doi     = { 10.5120/ijca2019918600 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2019
                        %A M. Babul Islam
                        %T Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure%T 
                        %J International Journal of Computer Applications
                        %V 182
                        %N 45
                        %P 1-4
                        %R 10.5120/ijca2019918600
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

In this paper, a Mel-scaled AR (Mel-AR) model based VAD is presented, where likelihood ratio measure is used to classify the input speech frames as speech/non-speech segments. The Mel-AR model parameters have been estimated on the linear frequency scale from the input speech signal without applying bilinear transformation. This has been done by employing a first-order all-pass filter rather than unit delay. The performance of the proposed VAD is evaluated on Aurora-2 database by measuring FAR and FRR. The equal false rate (EFR) at the crossover point is also presented as a merit of VAD. In addition, the performance of the proposed VAD in speech recognition is verified by incorporating it with a Mel-Wiener filter for MLPC based noisy speech recognition.

References

J. Ramirez and et. al. 2004 A new KullbackLeibler VAD for speech recognition in noise. IEEE Signal Processing Letters, 11(2): 266-269.
ITU-T Recommendation G.729-Annex B. 1996. A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70.
ETSI. 1999. Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels. ETSI EN 301 708 Recommendation.
ETSI. 2007. Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. ETSI ES 202 050 v1.1.5.
Asgari, M. 2008. Voice Activity Detection Using Entropy in Spectrum Domain. Telecommunication Networks and Applications Conference, 407-410.
Evanglelopulos, G. and Maragos, P. 2006. Multiband modulation energy tracking for noisy speech detection. IEEE Trans. Audio, Speech and Lang. Process, 14(6), 2024-2038.
Padrell, J., Macho, D. and Nadeu, J. 2005. Robust speech activity detection using LDA applied to FF parameters. Proceedings ICASSP’05, 1: 557-560.
Bachu, R. G. et al. 2010. Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy. Advanced Techniques in Computing Sciences and Software Engineering, K. Elleithy, Ed., ed: Springer Netherlands, 279- 282.
Fukuda, T. Ichikawa, O. and Nishimura, M. 2010. Improved voice activity detection using static harmonic features. Proceeding ICASSP’10, 4482-4485.
Li, K., et al. 2005. An improved voice activity detection using higher order statistics. IEEE Trans. Speech and Audio Process, 13(5): 965-974.
Sohn, J. et al. 1999. A statistical model-based voice activity detection. IEEE Signal Process. Letters, 16(1): 1-3.
Cho, Y. D. et al. 2001. Improved voice activity detection based on a Smoothed statistical likelihood ratio. Proceedings ICASSP’01, 2: 737-740.
Gorriz, J. M. et al. 2008. Jointly Gaussian PDF-Based Likelihood Ratio Test for Voice Activity Detection. IEEE Trans. On Audio, Speech and Lang. Process, 16(8): 1565-1578.
Fujimoto, M. et al. 2007. Noise Robust Voice Activity Detection based on Statistical Model and Parallel Non-linear Kalman Filtering. Proceedings ICASSP’07, 4: 797-800.
Bao, X. and Zhu, J. 2012. A novel voice activity detection based on phoneme recognition using statistical model, EURASIP Journal on Audio, Speech, and Music Processing, 2012(1): 1-10.
Tan, L. N.et al. 2010. Voice activity detection using harmonic frequency components in likelihood ratio test, ICASSP’10, 4466-4469.
Ramirez, J. et al. 2007. Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition. IEEE transactions on audio, speech and language processing, 15(8): 2177-2189.
Gorriz, J. M. et al. 2005. An improved MO-LRT VAD based on a bispectra Gaussian model. Electronics Letters, 41(15): 877-879.
Juang, B. 1984. On the hidden Markov model and dynamic time warping for speech recognition - a unified view. AT&T Bell Lab. Tec. Journal, 63(7): 1213-1243.
Oppenheim, A. V. and Johnson, D. H. 1972. Discrete representation of signals. IEEE Proc., 60(6): 681-691.
Strube, H. W. 1980. Linear prediction on a warped frequency scale. J. Acoust. Soc. America, 68(4): 1071-1076.
Matsumoto, H., et al. 1998. An efficient Mel-LPC analysis method for speech recognition. Proc. of ICSLP’98: 1051- 1054.
Islam, M. B., et al. 2007. Mel-Wiener filter for Mel-LPC based speech recognition. IEICE Transactions on Information and Systems, E90-D (6): 935-942.
Itakura, F. and Saito, S. 1968. Analysis synthesis telephony based on the Maximum Likelihood Method. Proc. of 6th International Congress on Acoustic, C17-C20.
Hirsch, H. G. and Pearce, D. 2000. The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proc. ISCA ITRW ASR 2000: 181-188.
Leonard, R. G. 1984. A database for speaker independent digit recognition. ICASSP’84, 3: 42.11.1-42.11.4.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

VAD Mel-AR model Likelihood ratio Itakura-Saito distortion Aurora 2 database