ViVoLab, Aragón Institute for Engineering Research (I3A), University of Zaragoza, Spain
Contact: {dribas, amiguel, ortega, lleida}@unizar.es
In order to support medical doctors in having more versatile health assistance, automatic voice disorder detection systems enable the remote diagnosis, treatment, and monitoring of voice pathologies. The main problem for developing the related technology is the availability of audio data of healthy and pathological voices manually labeled by experts. Saarbruecken Voice Database (SVD) was created in 1997, with a collection of more than 5 hours of healthy and pathologica audio data. This database has been widely used for developing voice disorder detection systems. However, it has some issues in the distribution of data and the labeling that makes it difficult to conduct conclusive studies.
This paper evaluates an Automatic Voice Disorder Detection (AVDD) system using the recent Advanced Voice Function Assessment Database (AVFAD) with almost 40 hours of audio data and SVD as a reference. The system consists of a representation using spectral, prosody, and voice quality parameters followed by an SVM classifier that can obtain up to 88% accuracy in phrases and 86% in sustained vowel a. Data augmentation strategy is assessed for handling the problem of data imbalance with the SMOTE method which improves the performance of male, female, and gender-independent models without decreasing the results for scenarios with data balance.
Finally, we release the system implementation for voice disorder detection including the list of train-test partitions for both databases.