Observatorio de I+D+i UPM

Memorias de investigación
HIFI-MM1: Base de datos para el Control vocal de un equipo HIFI
Áreas de investigación
  • Tecnología electrónica y de las comunicaciones,
  • Ingeniería eléctrica, electrónica y automática
This database is com- posed of 100 di ferent sentences spoken by 13 speakers (7 male, 6 female), giving a total of 1300 sentences related with the applica- tion domain. By means of a k-fold approach we have split the database into ten di erent folds, each one with 130 sentences picked up randomly from the database. Each sentence of the database has been manually labeled with its appropriate di- alogue items. On average, each sentence makes reference to 4:31 concepts and 2:17 goals. With the folds in which we split the database, we build three di erent sets: a training one, composed of eight folds (1040 sentences), and a validation and a test sets, each one with one fold (130 sentences). Using round-robin we develop ten experi- ments. On each one we use the training sub- set to build the background LM, whilst the validation subset served us to tune the di er- ent parameters: LM weight (LMW), inter- word penalty (IWP), and concept and goal thresholds, C and G, as well as the in- terpolation weight with the background LM, WB. Using the test subset to evaluate the per- formance of the ASR, the baseline results (without using dynamic LM interpolation) shows a word error rate of 5:33 %. We have evaluated the clustering ap- proach using slots and goals separately, that is, using only semantic-based or only intention-based information for estimating the dynamic LM. As we stated in Section 3, the number of groups taken into consid- eration on each approach are 10 (when using concept-based grouping), and 4 (goal-based grouping). Finally, we emphasize that we have ana- lyzed the results of the recognition process when rescoring an utterance with the infor- mation obtained from that utterance. We will further use this results as an oracle, or an upper bound of the performance of our LM adaptation approach. This document describes the generalities of the HIFI-MM1 corpus. $DB_ROOT will refer to the root directory where the corpus has been stored. The HIFI-MM1 corpus was designed to fulfill the following objectives: + Allow evaluation and fine tuning of multi-channel audio adquisition system in the EDECÁN project demonstration room at GTH. + Allow the evaluation of acoustic modules performance: localization, recognition and beamforming, mainly + Allow the evaluation of speech understanding modules. All of them related to a domain in which the objective is controlling a HIFI system (Sharp CDC410) by voice.
En explotación
Fecha de registro
Número de registro
Esta actividad pertenece a memorias de investigación
  • Autor: Fernando Fernandez Martinez (UPM)
  • Autor. Orden 4: Javier Macías Guarasa (Universidad de Alcalá de Henares)
  • Autor: Javier Ferreiros Lopez (UPM)
  • Autor: Roberto Barra Chicote (UPM)
  • Autor: Juan Manuel Lucas Cuesta (UPM)
  • Autor: Juan Manuel Montero Martinez (UPM)
  • Autor: Ruben San Segundo Hernandez (UPM)
  • Autor: Ricardo de Cordoba Herralde (UPM)
  • Autor: Luis Fernando D'Haro Enriquez (UPM)
  • Autor: Jose Manuel Pardo Muñoz (UPM)
Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de Tecnología del Habla
  • Departamento: Ingeniería Electrónica
S2i 2021 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)