A Phonetic Analysis of Speaker Verification Systems through Phoneme selection and Integrated GradientsThebaud Thomas , Gabriel Hernandez Sierra, Sarah Flora Samson Juan, Marie Tahon

JHU - Johns Hopkins University

  CENATAV - Centro de Aplicaciones de Tecnologías de Avanzada [La Havane]

UNIMAS - University of Malaysia [Sarawak]

  LIUM - Laboratoire d'Informatique de l'Université du Mans



Contact: tthebau1[@]jhu.edu; gabrielcuba[@]gmail.com




Speaker recognition systems are usually crafted to identify or verify the identity of a given speaker independently of the linguistic content contained in the utterance used. We use two explainability techniques to analyze the impact of phonetic variations on a speaker verification system using VoxCeleb.

We use Whisper and the Montreal Forced Aligner (MFA) to transcribe, then segment phonetically the Voxceleb1 test set. Phoneme selection is first used, before computation of the x-vectors, to observe which phonemes are the most discriminative through their impact on EER and MinDCF metrics. Integrated Gradients are then used to show which phonemes yielded the highest gradients comparing two speakers.

We find that for the representation of the x-vector in speaker recognition systems, both consonants and vowels are relevant and important to capture the distinctive characteristics of a speaker’s voice and generate effective and discriminative representations. 



Read the PDF

Partagez :