Memorias de investigación
Ponencias en congresos:
n-gram Frequency Ranking with additional sources of information in a multiple-Gaussian classifier for Language Identification
Año:2008

Áreas de investigación
  • Inteligencia artificial,
  • Industria electrónica

Datos
Descripción
We present new results of our n-gram frequency ranking used for language identification. We use a Parallel phone recognizer (as in PPRLM), but instead of the language model, we create a ranking with the most frequent n-grams. Then we compute the distance between the input sentence ranking and each language ranking, based on the difference in relative positions for each n-gram. The objective of this ranking is to model reliably a longer span than PPRLM. This approach overcomes PPRLM (15% relative improvement) due to the inclusion of 4-gram and 5-gram in the classifier. We will also see that the combination of this technique with other sources of information (feature vectors in our classifier) is also advantageous over PPRLM, showing also a detailed analysis of the relevance of these sources and a simple feature selection technique to cope with long feature vectors. The test database has been significantly increased using cross-fold validation, so comparisons are now more reliable.
Internacional
No
Nombre congreso
V Jornadas de Tecnología del Habla
Tipo de participación
960
Lugar del congreso
Bilbao
Revisores
Si
ISBN o ISSN
978-84-9860-169-5
DOI
Fecha inicio congreso
12/11/2008
Fecha fin congreso
14/11/2008
Desde la página
49
Hasta la página
52
Título de las actas
Actas de V Jornadas de Tecnología del Habla

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de Tecnología del Habla
  • Departamento: Ingeniería Electrónica