Toroidal Probabilistic Spherical Discriminant AnalysisSilnova Anna, Niko. Brümmer, Albert Swart and Lukáš Burget,

Brno University of Technology, Speech@FIT  and IT4I Center of Excellence, Brno, Czechia

Amazon Alexa, South Africa

Speechly, Finland



Contact: isilnova@fit.vutbr.cz



DOI. (10.1109/ICASSP49357.2023.10095580)

In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring back-ends are commonly used, namely cosine scoring and PLDA. We have recently proposed PSDA, an analog to PLDA that uses Von Mises-Fisher distributions instead of Gaussians.

In this paper, we present toroidal PSDA (T-PSDA). It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere. Like PLDA and PSDA, the model allows closed-form scoring and closed-form EM updates for training. On VoxCeleb, we find T-PSDA accu-racy on par with cosine scoring, while PLDA accuracy is inferior. On NIST SRE’21 we find that T-PSDA gives large accuracy gains compared to both cosine scoring and PLDA. 1



Read the PDF 

Partagez :