Lopez-Espejo_ IEEE-MLSP2023

Improved Vocal Effort Transfer Vector Estimation For Vocal Effort-Robust Speaker VerificationI. López-Espejo, S. Prieto, Alfonso Ortega, Eduardo Lleida

Department of Electronic Systems, Aalborg University, Denmark

Center for Robust Speech Systems (CRSS), The University of Texas at Dallas, USA

VeriDas | das-Nano, Navarre, Spain

ViVoLab, Aragón Institute for Engineering Research (I3A), University of Zaragoza, Spain

Contact: {ortega, lleida}@unizar.es

doi: 10.1109/MLSP55844.2023.10285923

Despite the maturity of modern speaker verification technology, its performance still significantly degrades when facing non-neutrally-phonated (e.g., shouted and whispered) speech.

To address this issue, in this paper, we propose a new speaker embedding compensation method based on a minimum mean square error (MMSE) estimator. This method models the joint distribution of the vocal effort transfer vector and nonneutrally-phonated embedding spaces and operates in a principal component analysis domain to cope with non-neutrallyphonated speech data scarcity. Experiments are carried out using a cutting-edge speaker verification system integrating a powerful self-supervised pre-trained model for speech representation.

In comparison with a state-of-the-art embedding compensation method, the proposed MMSE estimator yields superior and competitive equal error rate results when tackling shouted and whispered speech, respectively.

Read the PDF

Published on January 31, 2024

Partagez :