Descripción
|
|
---|---|
In this paper we present our results on using Recurrent Neural Networks Language Model scores (RNNLM) trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone vector embeddings as a pre-processing step. We will provide more details on the vocabulary reduction efforts on 2-gram and 3-gram. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty closed condition. Thanks to our clustering technique and the combination of high level phone-grams, our phonotactic system performs more than 10% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system. | |
Internacional
|
Si |
Nombre congreso
|
Iberspeech 2016 |
Tipo de participación
|
970 |
Lugar del congreso
|
Lisboa - Portugal |
Revisores
|
Si |
ISBN o ISSN
|
978-3-319-49169-1 |
DOI
|
|
Fecha inicio congreso
|
23/11/2016 |
Fecha fin congreso
|
25/11/2016 |
Desde la página
|
109 |
Hasta la página
|
118 |
Título de las actas
|
IberSpeech 2016 - Proceedings |