Research Article

Managing Distribution Shift in Speech Emotion Recognition: An Empirical Study with Confidence-Based Filtering

by  Ruwini Madhushika Herath
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 63
Published: December 2025
Authors: Ruwini Madhushika Herath
10.5120/ijca2025926040
PDF

Ruwini Madhushika Herath . Managing Distribution Shift in Speech Emotion Recognition: An Empirical Study with Confidence-Based Filtering. International Journal of Computer Applications. 187, 63 (December 2025), 26-33. DOI=10.5120/ijca2025926040

                        @article{ 10.5120/ijca2025926040,
                        author  = { Ruwini Madhushika Herath },
                        title   = { Managing Distribution Shift in Speech Emotion Recognition: An Empirical Study with Confidence-Based Filtering },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 63 },
                        pages   = { 26-33 },
                        doi     = { 10.5120/ijca2025926040 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2025
                        %A Ruwini Madhushika Herath
                        %T Managing Distribution Shift in Speech Emotion Recognition: An Empirical Study with Confidence-Based Filtering%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 63
                        %P 26-33
                        %R 10.5120/ijca2025926040
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Speech emotion recognition (SER) plays an important role in human–computer interaction, healthcare, and customer service. Yet SER models often degrade when applied across genders or to external corpora, limiting their reliability in real-world deployments. This study investigates the robustness of classical classifiers- Logistic Regression, Random Forests, and XGBoost, under gender and domain shifts, with a focus on confidence-based routing as a mitigation strategy. In-domain experiments demonstrated strong performance for tree-based ensembles, with Random Forests achieving up to 0.879 accuracy and XGBoost 0.914 on gender-specific training, while Logistic Regression performed poorly (0.478). Cross-domain evaluation on the RAVDESS corpus revealed sharp declines: Random Forest accuracy dropped to 0.466, and XGBoost models failed in cross-gender transfer (0.266–0.311). High-arousal emotions generalized more reliably than low-arousal categories, which exhibited widespread misclassification. A confidence-filtering mechanism was introduced to improve reliability. With a threshold of ≥0.60, Random Forest accuracy recovered to 0.811 (macro-F1 = 0.602) on a small subset of 7% of predictions. While limited in coverage, this serves as a proof-of-concept that selective prediction can recover trustworthy outputs under distribution shift. These findings highlight the limitations of current SER models under distribution shift but also suggest a practical path forward. For both emotion recognition and future stress detection, incorporating confidence-aware routing may be as important as improving raw accuracy, enabling selective and trustworthy predictions in sensitive applications.

References
  • Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., Verma, R., 2014. CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset. IEEE Trans. Affective Comput. 5, 377–390. https://doi.org/10.1109/TAFFC.2014.2336244
  • Fucci, D., Gaido, M., Negri, M., Cettolo, M., Bentivogli, L., 2023. No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation. https://doi.org/10.48550/arXiv.2310.06590
  • Kim, J., Englebienne, G., Truong, K.P., Evers, V., 2017. Towards Speech Emotion Recognition “in the wild” using Aggregated Corpora and Deep Multi-Task Learning. https://doi.org/10.48550/arXiv.1708.03920
  • Lee, C.-C., Chaspari, T., Provost, E.M., Narayanan, S.S., 2023. An Engineering View on Emotions and Speech: From Analysis and Predictive Models to Responsible Human-Centered Applications. Proc. IEEE 111, 1142–1158. https://doi.org/10.1109/JPROC.2023.3276209
  • Livingstone, S.R., Russo, F.A., 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13, e0196391. https://doi.org/10.1371/journal.pone.0196391
  • Trupti Dilip Kalokhe, Prof. Rashmi Kulkarni, 2024. A Comprehensive Review of Machine Learning Approaches for Speech Emotion Recognition. IJARSCT 60–73. https://doi.org/10.48175/IJARSCT-22308
  • Wani, T.M., Gunawan, T.S., Qadri, S.A.A., Kartiwi, M., Ambikairajah, E., 2021. A Comprehensive Review of Speech Emotion Recognition Systems. IEEE Access 9, 47795–47814. https://doi.org/10.1109/ACCESS.2021.3068045
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Speech emotion recognition; Domain shift; Random Forest; XGBoost; Confidence filtering; Stress detection

Powered by PhDFocusTM