Paper
21 April 2022 Environmental domain embedding for far-field speaker verification
Ze-xi Zhu, Ye Jiang
Author Affiliations +
Proceedings Volume 12175, International Conference on Network Communication and Information Security (ICNCIS 2021); 121750B (2022) https://doi.org/10.1117/12.2628416
Event: International Conference on Network Communication and Information Security (ICNCIS 2021), 2021, Beijing, China
Abstract
Speaker embedding based on neural network has achieved significant performance in speaker verification. However, this method results maladaptation of far-field speech. To fix this problem, this paper proposes applying environmental domain embedding to far-field speaker verification. By mapping speech environment to domain embedding and combining it with speaker features, the model is guided to detect the information of environment and speaker identity, thus improves the performance of far-field recognition. Comparing to traditional single domain and single embedding depth method, we explore the effects of domain type and embedding depth from gender, speech recording distance and noise. Furthermore, embeddings are applied to convolution through reshaping rather than vanilla fully-connected layers. The improved embedding method allows better combination. Experiments show the EER (Equal Error Rate) of gender-based, distance-based and noise-based model, comparing to baseline (ResNet-34, EER=1.28%), respectively decrease to 1.12%, 0.92% and 0.88%. And further comparison tests discover that carrying out combination in the second or third model stage achieves better performance.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ze-xi Zhu and Ye Jiang "Environmental domain embedding for far-field speaker verification", Proc. SPIE 12175, International Conference on Network Communication and Information Security (ICNCIS 2021), 121750B (21 April 2022); https://doi.org/10.1117/12.2628416
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Convolution

Speaker recognition

Performance modeling

Feature extraction

Signal to noise ratio

Computer programming

Data modeling

Back to Top