Paper
10 February 2025 Hearing emotions: fine-tuning speech emotion recognition models
Parvez Mohammed, Bakir Hadžić, Mohammed Eyad Alkostantini, Naoyuki Kubota, Youssef Shiban, Matthias Rätsch
Author Affiliations +
Proceedings Volume 13540, Fifth Symposium on Pattern Recognition and Applications (SPRA 2024); 1354005 (2025) https://doi.org/10.1117/12.3057659
Event: The 5th Symposium on Pattern Recognition and Applications (SPRA 2024), 2024, Istanbul, Turkey
Abstract
Over the past few decades, scholars and academics from various disciplines have been motivated to develop automated emotion detection systems. In this pursuit, audio data and especially prosodic features, holds the most promise to deliver satisfying results. Therefore, main aim of this approach was to evaluate standard machine learning algorithms on the task of emotion recognition from audio data. We evaluate the effect of training dataset size on model performance by means of incremental fine-tuning after conducting zeroshot testing on a range of widely-used datasets in the literature, such as CREMA-D, RAVDESS, TESS, SAVEE, MELD, eNTERFACE, EmoDB, and IEMOCAP. To improve model generalizability, we used data augmentation approaches, and for robust emotion detection, we used feature extraction techniques as MFCC, ZCR, and RMS. On CREMA-mixed datasets, experimental results show great initial accuracy with CNN model. Cross-corpus validation highlights the importance of diverse datasets, showing significant accuracy improvements with incremental fine-tuning. Our research opens the door to more potent emotion detection systems in practical applications by highlighting the necessity of varied training data for robust, generalizable SER models.
(2025) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Parvez Mohammed, Bakir Hadžić, Mohammed Eyad Alkostantini, Naoyuki Kubota, Youssef Shiban, and Matthias Rätsch "Hearing emotions: fine-tuning speech emotion recognition models", Proc. SPIE 13540, Fifth Symposium on Pattern Recognition and Applications (SPRA 2024), 1354005 (10 February 2025); https://doi.org/10.1117/12.3057659
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Emotion

Performance modeling

Education and training

Machine learning

Speech recognition

Systems modeling

Back to Top