Improving Speaker Diarization for Low-Resourced Sarawak Malay Language Conversational Speech CorpusRahim Mohd Zulhafiz; Sarah Samson Juan; Fitri Suraya Mohamad

Faculty of Computer Science and Information Technology
Universiti Malaysia Sarawak, Malaysia

 Institute of Borneo Studies
Universiti Malaysia Sarawak, Malaysia




DOI. (10.1109/IALP61005.2023.10337314)

Speaker diarization plays a vital role in speech transcription involving conversations as it improves the transcribed content's accuracy, comprehension, and usability. By having a speech transcription diarized, the conversation data has a more structured presentation, allowing for a variety of applications that rely on accurate speaker attribution. Even so, speaker diarization is a field that has been less explored for low-resourced languages, as current resources that have been optimized and applied in speaker diarization are mostly for more developed and well-resourced languages, such as English, Spanish or French.

In this paper, we propose an approach to using pseudo-labelled speech data to perform self-training on the x-vector models to improve diarization accuracy. The proposed method uses almost 13 hours Sarawak Malay unlabeled conversational speech corpus obtained from the Kalaka: Language Map of Malaysia website for training, as well as 1 hour and 26 minutes of manually labeled Sarawak Malay speech data for testing and evaluation. We demonstrate how speaker diarization models can be fine-tuned with the pseudo-labeled data.



Read the PDF 

Partagez :