Research Article

Step-by-step Approach to Automatic Speech Emotion Recognition

by  Purnima Chandrasekar, Shailendra Pratap Shastri
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Issue 37
Published: August 2024
Authors: Purnima Chandrasekar, Shailendra Pratap Shastri
10.5120/ijca2024923947
PDF

Purnima Chandrasekar, Shailendra Pratap Shastri . Step-by-step Approach to Automatic Speech Emotion Recognition. International Journal of Computer Applications. 186, 37 (August 2024), 37-43. DOI=10.5120/ijca2024923947

                        @article{ 10.5120/ijca2024923947,
                        author  = { Purnima Chandrasekar,Shailendra Pratap Shastri },
                        title   = { Step-by-step Approach to Automatic Speech Emotion Recognition },
                        journal = { International Journal of Computer Applications },
                        year    = { 2024 },
                        volume  = { 186 },
                        number  = { 37 },
                        pages   = { 37-43 },
                        doi     = { 10.5120/ijca2024923947 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2024
                        %A Purnima Chandrasekar
                        %A Shailendra Pratap Shastri
                        %T Step-by-step Approach to Automatic Speech Emotion Recognition%T 
                        %J International Journal of Computer Applications
                        %V 186
                        %N 37
                        %P 37-43
                        %R 10.5120/ijca2024923947
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Humans use emotions to express themselves naturally either through facial expressions or through speech. Emotions play an important role in influencing the decision-making capability of human beings as human mind is influenced by personal experiences as well as physiological, communicative and behavioral reaction to external stimulus. While considering emotions displayed through speech, one needs to understand that a speech signal not only conveys the emotional state of the speaker which is visible from the intent of the message as well as the gender of the person and the language spoken. While an effective communication between humans through speech ensures exchange of right amount of ideas, messages and perceptions, interaction between human and machine with the same intent becomes challenging as a machine is expected to mimic the mechanism of human perception. Automatic Speech Emotion recognition (ASER) systems has found usefulness in several applications viz. healthcare, counseling, call center communication etc. Primary to this system are three basic components viz. creation of emotional speech corpus, extraction of features relevant to emotion detection and classification of emotion in the test speech using appropriate classifiers. This paper surveys extensively the prominent features extracted, several dimension reduction techniques and classifiers commonly used in recent times. It also throws light on the concept of auto encoders being used in recent times in the process of ASER.

References
  • Basu, S., Chakraborty, J., Bag, A. and Aftabuddin, M. A review on emotion recognition using speech. 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India, 2017, pp. 109-114, doi: 10.1109/ICICCT.2017.7975169.
  • Ayadi, M., Kamel, M. and Karray, F. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, vol. 44, Issue 3, pp. 572-587, Mar 2011
  • Kotsakis, N., Liatsou, A., Dimoulas, C., Kalliris, G. Speech Emotion Recognition for Performance Interaction. Journal of Audio Engineering Society, vol. 66, Issue 6 pp. 457-467, June 2018
  • Akçay, B., Oguz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116. 10.1016/j.specom.2019.12.001.
  • Wang, C. et al. Speech emotion recognition based on multi-feature and multi‐lingual fusion. Multimed Tools Appl, 81, 4897–4907 (2022). https://doi.org/10.1007/s11042-021-10553-4
  • Patel, N., Patel, S., Mankad, S.H. Impact of autoencoder based compact representation on emotion detection from audio. Journal of Ambient Intelligence and Humanized Computing, 13, 867–885 (2022). https://doi.org/10.1007/s12652-021-02979-3
  • Swain, M., Routray, A. and Kabisatpathy, P. Databases, features and classifiers for speech emotion recognition: a review. International Journal of Speech Technology, 21, 93–120 (2018). https://doi.org/10.1007/s10772-018-9491-z
  • Emotional Speech Databases. [Online]. Available: https://link.springer.com/content/pdf/bbm:978-90-481-3129-7/1.pdf
  • Koolagudi, S. et al. (2009), IITKGP-SESC: Speech Database for Emotion Analysis. In: Ranka, S., et al. Contemporary Computing. IC3 2009. Communications in Computer and Information Science, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_46
  • Koolagudi, S., Reddy, R., Yadav, J. and Rao, K.S. IITKGP-SEHSC: Hindi Speech Corpus for Emotion Analysis. 2011 International Conference on Devices and Communications (ICDeCom), Mesra, India, 2011, pp. 1-5, doi: 10.1109/ICDECOM.2011.5738540.
  • Shrishrimal, P., Deshmukh, R. and Waghmare, V. Indian Language Speech Database: A Review. Intl. Journal of Computer Applications, vol.47, no. 5, pp. 17-21, June 2012
  • How to build your own Speech Emotion Recognition? [Online]. Available: https://vivoka.com/how-to-speech-emotion-recognition/
  • Alex, S., Mary, L and Babu, B. Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features. Circuits Syst Signal Process 39, 5681–5709 (2020). https://doi.org/10.1007/s00034-020-01429-3
  • Zhang, S., Zhang, S., Huang, T. and Gao, W. Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching. in IEEE Transactions on Multimedia, vol. 20, no. 6, pp. 1576-1590, June 2018, doi: 10.1109/TMM.2017.2766843.
  • Bandela, S., and Kumar, T. Stressed speech emotion recognition using feature fusion of teager energy operator and MFCC. 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Delhi, India, 2017, pp. 1-5, doi: 10.1109/ICCCNT.2017.8204149.
  • Letaifa, L., Torres, M. and Justo, R. Adding dimensional features for emotion recognition on speech. 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 2020, pp. 1-6, doi: 10.1109/ATSIP49331.2020.9231766.
  • Alex, S. and Mary, L. Variational autoencoder for prosody-based speaker recognition. ETRI Journal, 45 (2023), pp. 678–689. https://doi.org/10.4218/etrij.2021-0377
  • Xia, R. and Liu, Y. Using denoising autoencoder for emotion recognition. In Interspeech, pp. 2886-2889. 2013.
  • Deng, J. Zhang, Z., Marchi, E. and Schuller, B. Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition. 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2013, pp. 511-516, doi: 10.1109/ACII.2013.90.
  • Bhaswara, I.D. (2020) Exploration of autoencoder as feature extractor for face recognition system. [Online]. Available: https://essay.utwente.nl/83138/
  • Chebbi, S. and Jebara, S. On the use of Pitch-based features for fear emotion Detection from Speech. 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Mar 2018
  • Huang, C., Gong, W., Fu, W. and Feng, D. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM. Mathematical Problems in Engineering, vol. 2014
  • Khulage, A. and Pathak, B. Analysis of speech under stress using Linear Techniques and Non-Linear techniques for Emotion Recognition System. Jul 2012, https://doi.org/10.48550/arXiv.1207.5104
  • LPCC Features [Online]. Available: https://link.springer.com/content/pdf/bbm%3A978-3-319-17163-0%2F1.pdf
  • Shah, A., Kattel, M., Nepal, A. and Shrestha, D. Chroma Feature Extraction. Jan 2019
  • Revathi, A., Sasikaladevi, N., Nagakrishnan, R. et al. Robust emotion recognition from speech: Gamma tone features and models. Int J Speech Technol 21, 723–739 (2018). https://doi.org/10.1007/s10772-018-9546-1
  • Dmitrieva, E. and Nikitin, K. Design of Automatic Speech Emotion Recognition System. Proceedings of the International Workshop on Applications in Information Technology, pp. 47-50, 2015
  • Schuller, B., Reiter, S. and Rigoll, G. Evolutionary feature generation in speech emotion recognition. IEEE International Conference on Multimedia and Expo. IEEE, pp. 5-8, 2006
  • Kadiri, S., Gangamohan, P., Gangashetty, S. et al., Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference. Circuits Syst Signal Process 39, 4459–4481 (2020). https://doi.org/10.1007/s00034-020-01377-y
  • Amartya, J.G.M., Kumar, S.M. Speech Emotion Recognition in Machine Learning to Improve Accuracy using Novel Support Vector Machine and Compared with Decision Tree Algorithm. Journal of Pharmaceutical Negative Results, vol. 13, no. 4, pp. 185-192, 2022
  • Koduru, A., Valiveti, H.B. and Budati, A.K. Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, vol. 23, pp. 45-55, Jan 2020
  • Sahu, S. et al. Adversarial Auto-encoders for Speech Based Emotion Recognition. arXiv preprint arXiv:1806.02146 (2018).
  • Partila, P., Voznak, M. and Tovarek, J. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System. The Scientific World Journal, vol. 2015, Article ID 573068, pp. 1-7, 2015. https://doi.org/10.1155/2015/573068
  • Madanian, S. et al. Speech emotion recognition using machine learning — A systematic review. Intelligent Systems with Applications, vol. 20, Nov 2023
  • Confusion Matrix, Accuracy, Precision, Recall & F1 Score: Interpretation of Performance Measures [Online]. Available: https://www.linkedin.com/pulse/confusion-matrix-accuracy-precision-recall-f1-score-measures-silwal#:~:text=F1%20score%20is%20a%20weighted,have%20an%20uneven%20class%20distribution.
  • Classification: Precision and Recall [Online]. Available: https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall
Index Terms
Computer Science
Information Sciences
Pattern Recognition
Keywords

ASER feature extraction dimensionality reduction auto encoders

Powered by PhDFocusTM