The CONILIUM proposition for Odyssey Emotion Challenge Leveraging major class with complex annotationsMeysam Shamsi, Lara Gauder, Marie Tahon

  LIUM - Laboratoire d'Informatique de l'Université du Mans, LST - Equipe Language and Speech Technology

 ICC - Instituto de Investigación en Ciencias de la Computación [Buenos Aires] 




Contact: {meysam.shamsi ; marie.tahon}@univ-lemans.fr



This paper describes the contribution of the CONILIUM team in the Odyssey Emotion Recognition Challenge. Our system focuses on predicting categorical emotions from speech recordings in the MSP-Podcast corpus. Focusing on the training protocol, we investigated several approaches to improve emotion recognition accuracy. Different pre-trained models (WavLM-large, Wav2vec2-large, Hubert-large) were evaluated as feature extractors. An agreement-aware loss functions based on all secondary annotations is proposed that consider the disagreement among annotators and the ambiguity of emotional labeling during training. An idea of keeping only samples with high agreement annotation in the training process shows the benefit of using all annotations by all annotators.

Our best performing system utilized WavLM-large as the upstream model, weighted binary cross entropy with secondary labels as the loss function, and a post-processing step that adjusted the decision threshold. This model achieved an F1-Macro score of 0.361 on the development set, 0.335 on the test set, which is a significant improvement compare to the provided baseline. We also explore characteristics of Easy and Difficult samples based on their prediction performance consistency across different models. 






Read the PDF

Partagez :