Abstract
|
|
---|---|
This database is com- posed of 100 diferent sentences spoken by 13 speakers (7 male, 6 female), giving a total of 1300 sentences related with the applica- tion domain. By means of a k-fold approach we have split the database into ten dierent folds, each one with 130 sentences picked up randomly from the database. Each sentence of the database has been manually labeled with its appropriate di- alogue items. On average, each sentence makes reference to 4:31 concepts and 2:17 goals. With the folds in which we split the database, we build three dierent sets: a training one, composed of eight folds (1040 sentences), and a validation and a test sets, each one with one fold (130 sentences). Using round-robin we develop ten experi- ments. On each one we use the training sub- set to build the background LM, whilst the validation subset served us to tune the dier- ent parameters: LM weight (LMW), inter- word penalty (IWP), and concept and goal thresholds, C and G, as well as the in- terpolation weight with the background LM, WB. Using the test subset to evaluate the per- formance of the ASR, the baseline results (without using dynamic LM interpolation) shows a word error rate of 5:33 %. We have evaluated the clustering ap- proach using slots and goals separately, that is, using only semantic-based or only intention-based information for estimating the dynamic LM. As we stated in Section 3, the number of groups taken into consid- eration on each approach are 10 (when using concept-based grouping), and 4 (goal-based grouping). Finally, we emphasize that we have ana- lyzed the results of the recognition process when rescoring an utterance with the infor- mation obtained from that utterance. We will further use this results as an oracle, or an upper bound of the performance of our LM adaptation approach. This document describes the generalities of the HIFI-MM1 corpus. $DB_ROOT will refer to the root directory where the corpus has been stored. The HIFI-MM1 corpus was designed to fulfill the following objectives: + Allow evaluation and fine tuning of multi-channel audio adquisition system in the EDECÁN project demonstration room at GTH. + Allow the evaluation of acoustic modules performance: localization, recognition and beamforming, mainly + Allow the evaluation of speech understanding modules. All of them related to a domain in which the objective is controlling a HIFI system (Sharp CDC410) by voice. | |
International
|
No |
under exploitation
|
No |
Registration Date
|
27/09/2010 |
Registry number
|
M-7799/2010 |
Owners
|