Memorias de investigación
Ponencias en congresos:
A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis
Año:2014

Áreas de investigación
  • Tecnología electrónica y de las comunicaciones,
  • Ingeniería eléctrica, electrónica y automática

Datos
Descripción
Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.
Internacional
Si
Nombre congreso
15th Annual Conference of the International Speech Communication Association
Tipo de participación
960
Lugar del congreso
Singapore
Revisores
Si
ISBN o ISSN
2308-457X
DOI
Fecha inicio congreso
14/09/2014
Fecha fin congreso
18/09/2014
Desde la página
2370
Hasta la página
2374
Título de las actas
Proceedings 15th Annual Conference of the International Speech Communication Association

Esta actividad pertenece a memorias de investigación

Participantes
  • Autor: Ascensión Gallardo Antolín Dept. of Signal Theory and Communications, Universidad Carlos III de Madrid
  • Autor: Juan Manuel Montero Martinez UPM
  • Autor: Simon King The Centre for Speech Technology Research, University of Edinburgh, UK

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de Tecnología del Habla
  • Departamento: Ingeniería Electrónica