Observatorio de I+D+i UPM

| Otras actividades
HOME

Proyectos Internacionales Art�culos Patentes UPM Software UPM Empresas UPM Otras actividades Memorias de investigaci�n

Memorias de investigación

Art�culos en revistas:

Spatial Features Selection for Unsupervised Speaker Segmentation and Clustering

A�o:2017

�reas de investigaci�n

Tecnolog�a electr�nica y de las comunicaciones,
Ingenier�a el�ctrica, electr�nica y autom�tica

Datos

Descripci�n
The selection of the best features to be used in expert systems is a key issue in obtaining a satisfactory performance. Unsupervised speaker segmentation and clustering is the task of the automatic identifi- cation of the number of participants in a meeting and the determination of their speaking turns (also called ?diarization?). This is part of an intelligent system that replaces human intervention in several tasks related to automatic language and speech processing. The segmentation and clustering of speakers is crucial if we want to transcribe any audio recording automatically when several people take their turn. It is a task necessary to archive automatically interventions of several people in meetings, broadcast ra- dio, lectures, parliamentary sessions etc. since a simple transcription of what is said without assigning it to a specific speaker makes the information unusable. The automation of this task would save enormous amounts of resources currently spent on human transcribers. When used online it could also be useful to point a video camera automatically to the person talking when a videoconference with multiple speakers is taking place thus replacing a human operator. Furthermore it could also help to scan large amounts of audio automatically in search of crimes or audio interventions of a particular person. In the case of recordings with several distant microphones (MDM), spatial features may and should be used. The most widely used spatial features in diarization are the Time Delay of Arrival (TDOA) features. These delays are extracted from pairs of microphones of unknown location and quality, which makes the selection of the best pairs highly advisable. This paper analyses this issue and proposes and evaluates several methods that significantly improve the performance both in speaker error rate (SER) and in computational time. The methods propose a selection ofTDOA features based on the quality of the cross-correlation of signals coming from different pairs of microphones. We prove that the use of the wrong pairs can be highly detrimental to the overall performance. The methods proposed, based on cross correlation, are compared and combined with other two selection methods, based on the dynamic range of the delay features and the selection of every pair of microphones available followed by a reduction in dimensionality. Although all algorithms achieve some improvements, it is proved that selection methods based on cross correlation have the fewest errors. The improvements on the baseline system for the two best proposed systems are 25.14% and 33.70% for the development set, and 55.06% and 46.09% for the test set. Furthermore the best method for the test set also reduces the computational cost by 20%.
Internacional	Si
JCR del ISI	Si
T�tulo de la revista	Expert Systems With Applications
ISSN	0957-4174
Factor de impacto JCR	3,928
Informaci�n de impacto
Volumen
DOI	10.1016/j.eswa.2016.12.005
N�mero de revista	73
Desde la p�gina	27
Hasta la p�gina	42
Mes	MAYO
Ranking	Journal Rank in Category 18/133; Quartile in category Q1

Esta actividad pertenece a memorias de investigaci�n

Participantes

Autor: Beatriz Martinez Gonzalez UPM
Autor: Jose Manuel Pardo Mu�oz UPM
Autor: Juli�n David Echeverry Correa Facultad de Ingenier�as.Programa de Ingenier�a El�ctrica.Universidad Tecnol�gica de Pereira, Colombia
Autor: Ruben San Segundo Hernandez UPM

Grupos de investigaci�n, Departamentos, Centros e Institutos de I+D+i relacionados

Creador: Grupo de Investigaci�n: Grupo de Tecnolog�a del Habla
Departamento: Ingenier�a Electr�nica
Centro o Instituto I+D+i: Centro de I+d+i en Procesado de la Informaci�n y Telecomunicaciones