Alam_Odyssey2022 - ESPERANTO

Development of ABC Systems for the 2021 Edition of NIST Speaker Recognition EvaluationJahangir Alam, Radek Beneš, Marián Beszédeš, Lukáš Burget, Mohamed Dahmane, Abderrahim Fathan, Hamed Ghodrati, Ondřej Glembek, Woo Hyun Kang, Pavel Matĕjka, Ladislav Mošner, Oldřich Plchot, Johan Rohdin, Anna Silnova, Themos Stafylakis

Brno University of Technology, Speech@FIT and IT4I Center of Excellence, Brno, Czechia
Omilia - Conversational Intelligence, Athens, Greece
Computer Research Institute of Montreal (CRIM), Montreal (Quebec) Canada
Innovatrics, Bratislava, Slovakia

In this contribution, we provide a description of the ABC team’s collaborative efforts toward the development of speaker verification systems for the NIST Speaker Recognition Evaluation 2021 (NIST-SRE2021). Cross-lingual and cross-dataset trials are the two main challenges introduced in the NIST-SRE2021. Submissions of ABC team are the result of active collaboration of researchers from BUT, CRIM, Omilia and Innovatrics. We took part in all three close condition tracks for audio-only, audio-visual and visual-only verification tasks.

Our audio-only systems follow deep speaker embeddings (e.g., x-vectors) with a subsequent PLDA scoring paradigm. As embeddings extractor, we select some variants of residual neural network (ResNet), factored time delay neural network (FTDNN) and Hybrid Neural Network (HNN) architectures. The HNN embeddings extractor employs CNN, LSTM and TDNN networks and incorporates a multi-level global-local statistics pooling method in order to aggregate the speaker information within short time-span and utterance-level context. Our visual-only systems are based on pretrained embeddings extractors employing some variants of ResNet and the scoring is based on cosine distance. When developing an audio-visual system, we simply fuse the outputs of independent audio and visual systems. Our final submitted systems are obtained by performing score level fusion of subsystems followed by score calibration.

DOI: 10.21437/Odyssey.2022-48

Read the PDF

Published on July 6, 2022

Partagez :

Brno University of Technology, Speech@FIT and IT4I Center of Excellence, Brno, CzechiaOmilia - Conversational Intelligence, Athens, GreeceComputer Research Institute of Montreal (CRIM), Montreal (Quebec) CanadaInnovatrics, Bratislava, Slovakia

Brno University of Technology, Speech@FIT and IT4I Center of Excellence, Brno, Czechia
Omilia - Conversational Intelligence, Athens, Greece
Computer Research Institute of Montreal (CRIM), Montreal (Quebec) Canada
Innovatrics, Bratislava, Slovakia