Improved Vocal Effort Transfer Vector Estimation For Vocal Effort-Robust Speaker VerificationI. López-Espejo, S. Prieto, Alfonso Ortega, Eduardo Lleida

Department of Electronic Systems, Aalborg University, Denmark

Center for Robust Speech Systems (CRSS), The University of Texas at Dallas, USA


VeriDas | das-Nano, Navarre, Spain


ViVoLab, Aragón Institute for Engineering Research (I3A), University of Zaragoza, Spain


Contact: {ortega, lleida}@unizar.es


doi: 10.1109/MLSP55844.2023.10285923


Despite the maturity of modern speaker verification technology, its performance still significantly degrades when facing non-neutrally-phonated (e.g., shouted and whispered) speech.

To address this issue, in this paper, we propose a new speaker embedding compensation method based on a minimum mean square error (MMSE) estimator. This method models the joint distribution of the vocal effort transfer vector and nonneutrally-phonated embedding spaces and operates in a principal component analysis domain to cope with non-neutrallyphonated speech data scarcity. Experiments are carried out using a cutting-edge speaker verification system integrating a powerful self-supervised pre-trained model for speech representation.

In comparison with a state-of-the-art embedding compensation method, the proposed MMSE estimator yields superior and competitive equal error rate results when tackling shouted and whispered speech, respectively.


Read the PDF

Partagez :