Descripción
|
|
---|---|
This paper proposes an architecture, based on statistical machine translation, for developing the text normalization module of a text to speech conversion system. The main target is to generate a language independent text normalization module, based on data and flexible enough to deal with all situa-tions presented in this task. The proposed architecture is composed by three main modules: a tokenizer module for splitting the text input into a token graph (tokenization), a phrase-based translation module (token translation) and a post-processing module for removing some tokens. This paper presents initial exper-iments for numbers and abbreviations. The very good results obtained validate the proposed architecture. | |
Internacional
|
Si |
Nombre congreso
|
IberSPEECH 2012 |
Tipo de participación
|
960 |
Lugar del congreso
|
Madrid Spain |
Revisores
|
Si |
ISBN o ISSN
|
84-616-1535-2 |
DOI
|
|
Fecha inicio congreso
|
21/11/2012 |
Fecha fin congreso
|
22/11/2012 |
Desde la página
|
204 |
Hasta la página
|
213 |
Título de las actas
|
VII Jornadas en Tecnología del Habla and III Iberian SLTech Workshop |