ESPERANTO

Exchanges for SPEech ReseArch aNd TechnOlogies

Secondment of Norbert Tsopze

From Université Yaounde 1 to Le Mans Université

Norbert Tsopze, from Université Yaounde 1, was hosted for 1 months by Le Mans Université

Exploring Pre-trained SSL Models embeddings for TTS in Low-Resource Languages and performances Explaining.

 

For the low resource languages, it is difficult to collect matching text and speech data, which is not the case for a highly endowed languages like English. In the context of text-to-speech (TTS), Self-supervised learning (SSL) methods like Hubert or WavLM learns rich audio representations without relying on transcripts, making it especially suitable for languages with limited linguistic resources. it also improves language understanding and speech processing by helping the matching between part of text and part of voice. During this secondment, I initiate the use of the probing techniques to investigate the internal representations learned by self-supervised models such as WavLM, with a focus on how they encode speaker-related characteristics in low-resource language settings. By analyzing embeddings from different layers of the model, I aim to uncover their relationship with speaker identity, gender, and other relevant attributes.  This work aims to better understanding the SSL model potential for reuse in diverse tasks of TTS, especially in the low resource languages settings where annotated data is scarce.