Plenary talk by Victoria Mingote and Pablo Gimeno (UNIZAR)JSALT 2023

Representation and Metric Learning Advances for Face and Speaker Biometric Systems, Victoria Mingote

Abstract: In recent years, as advanced as deep learning techniques are, they still have some problems when the task has limited data or a successful approach in one task is intended to be used for another task. Therefore, in this talk, I will present different alternative approaches to deal with these issues in biometric systems. First part of the talk is focused on different ways to improve the generation of signal representations for the text-dependent speaker verification task, since this task has a strong dependency of the phonetic content. While in the second part, I will explain several approaches using new training loss functions for deep neural networks that are based on the final verification metrics. These training loss functions can be applied to different verification tasks.


Multiclass audio segmentation in broadcast environments, Pablo Gimeno

Abstract: Audio segmentation can be defined as the division of an audio signal into smaller fragments according to a predefined set of attributes. This wide definition could include several systems depending on the set of rules considered. In this talk, the focus will be set on multiclass audio segmentation tasks, aiming to obtain a set of labels describing several tipologies in an audio signal such as speech, music and noise. During the presentation, different approaches will be presented evaluating these kind of systems in broadcast domain data.



Watch the live stream here


Partagez :