Abstract
|
|
---|---|
In this work we present a neural network embedding we call Resource2Vec, which is able to represent the resources that make up some Linked Data (LD) corpora. A vector representation of these resources allows more advantageous processing (in computational terms) as is the case with known word or doc ument embeddings. We give a quantitative analysis for their study. Furthermore, we employ them in an Automatic Speech Recognition (ASR) task to demonstrate their functionality by designing a strategy for term discovery. This strategy permits out-of-vocabulary (OOV) terms in a Large Vocabulary Continuous Speech Recognition (LVCSR) system to be discovered and then put into the ?nal transcription. First, we detect where a potential OOV term may have been uttered in the LVCSR output speech segments. Second, we carry out a candidate OOV search in some LD corpora. This search is oriented by distance measure ments between the transcription context around the potential-OOV speech segment and the resources of the LD corpora in Resource2Vec format, obtaining a set of candidates. To rank them, we mainly depend on the phone transcription of that segment. Finally, we decide whether or not to incorporate a candidate into the ?nal transcription. The results show we are able to improve the transcription in Word Error Rate (WER) terms signi?cantly, after our strategy is used on speech in Spanish. | |
International
|
Si |
JCR
|
Si |
Title
|
Expert Systems With Applications |
ISBN
|
0957-4174 |
Impact factor JCR
|
3,768 |
Impact info
|
|
Volume
|
112 |
|
|
Journal number
|
|
From page
|
301 |
To page
|
320 |
Month
|
JUNIO |
Ranking
|
Journal Rank in Category 20/132 |