Politécnica de Madrid

Have you ever wondered how a speech therapist rate your voice/speech? Artificial intelligence makes the process automatic and more objective.

Researchers from UPM and Universidad de Antioquia have developed an automatic system which objectively analyze and rate the voice of a patient using Artificial Intelligence.


Due to the lack of objective means, the evaluation of the extent of a voice/speech disorder and of the efficacy of a treatment -including the surgical procedures-, relies on the perceptual criteria of the evaluator (phoniatrician, otolaryngologist and/or speech therapist) and on the self-assessment made by the patient. This fact introduces significant distortions in the assistive process.

The evaluation of the voice/speech quality is usually carried out subjectively in the clinic following listening tests, which means that the specialist listens to the patient and rate the speech/voice according to certain perceptual aspects, such as the grade of pathology, roughness, and breathiness. These perceptual aspects are rated with a numeric score, which is highly subjective and extremely dependent on the bias introduced by the evaluator. Such evaluation process is highly subjective and possess a strong variation which depends on the evaluator, his/her experience, background, and perceptual training, but also on other aspects such as the tiredness, stress, psico/pathological condition and environmental noise. The only way to remove all these variability factors is by introducing a technique to make the process more objective. The progress of today’s artificial intelligence methods open the possibility to develop computational models to tackle with these problems.

In this context, and taking advantage of artificial intelligence techniques, researchers from Universidad Politécnica de Madrid and Universidad de Antioquia have developed an automatic system which objectively analyze and rate the voice of a patient following the same aforementioned criteria [1][2].

The automatic system works like an artificial ear that automatically evaluates the most significant aspects rated by the specialists by comparing patient’s voice with artificial models generated using signal processing and machine learning techniques. The procedure is simple, easy to use, cheap and completely non-invasive, since it only requires the recording of the voice with a microphone, and a software is in charge of providing an evaluation of the recording.

Results have demonstrated that the artificial models are accurate enough to be used in the clinical practice, presenting a precision over the threshold currently assumed in the medical setting. To this respect, a blind clinical validation has demonstrated that the error committed by the artificial system is lower than that of a well-trained expert evaluator.

The system represents a step forward in the objective evaluation of the voice/speech quality, removing the subjectivity and errors introduced by human evaluators. This is especially relevant in collaborative contexts, since the patient is commonly evaluated by different specialists in the different stages of the health care process, making the agreement difficult.
Far from the clinical setting, the system also has implications in medico-legal or criminalistic forensic contexts requiring an objective evaluation of the voice.  
Juan I. Godino-Llorente, the principal investigator in this project, says that “he became interested in the subject after attending to a clinical session with three otolaryngologists and a speech therapist, who were discussing about the efficacy of a rehabilitation therapy after a surgical procedure. The patient had an improved voice quality after the procedure, but it was interesting to see how the different specialists had diverging views on the aspects that had improved. The meeting was repeated again one week later, and new divergences appeared, even in the evaluations made by each expert”. This made him think about the need to make the process more objective by developing artificial models.


3. J. D. Arias-Londoño; J. A. Gómez-García; J. I. Godino-Llorente, “Multimodal and multi-output deep learning architectures for the automatic assessment of voice quality using the GRB scale” IEEE J. Selected Topics in Signal Processing, Vol.14(2), pp. 413-422 , Feb. 2020
4. J. A. Gómez-García, L. Moro-Velázquez; J. Mendes-Laureano; G. Castellanos-Dominguez; J. I Godino-Llorente, "Emulating the Perceptual Capabilities of a Human Evaluator to map the GRB Scale for the Assessment of Voice Disorders", Engineering Applications of Artificial Intelligence", 82:236-251, 2019