Abstract
|
|
---|---|
Neural networks (NNs) have been extensively used in speech technology systems. In this paper, we present two novel applications of NNs in speech recognition and text-to-speech systems. In very large vocabulary speech recognition systems using the hypothesis-verification paradigm, the verification stage is usually the most time consuming. State of the art systems combine fixed size hypothesized search spaces with advanced pruning techniques. We propose a novel strategy to dynamically calculate the hypothesized search space, using neural networks as the estimation module and designing the input feature set with a careful greedy-based selection approach. The main achievement has been a statistically significant relative decrease in error rate of 33.53%, while getting a relative decrease in average computational demands of up to 19.40%. The prosodic modeling is one of the most important tasks for developing a new text-to-speech synthesizer, especially in a female-voice high-quality restricted-domain system. Our double objective is to get accurate predictors for both the fundamental frequency (F0) curve and phoneme duration by minimizing the model estimation error in a Spanish text-to-speech system, by means of a neural network estimator, which has proved to be an excellent tool for the modeling. The resulting system predicts prosody with very good results (for duration: 15.5 ms in RMS and a correlation factor of 0.8975; for F0: 19.80 Hz in RMS and a relative RMS error of 0.43) that clearly improves our previous rule-based system. | |
International
|
Si |
JCR
|
Si |
Title
|
INTELLIGENT AUTOMATION AND SOFT COMPUTING |
ISBN
|
1079-8587 |
Impact factor JCR
|
0,224 |
Impact info
|
|
Volume
|
15 |
|
|
Journal number
|
4 |
From page
|
631 |
To page
|
646 |
Month
|
ENERO |
Ranking
|