Observatorio de I+D+i UPM

Memorias de investigación
Ponencias en congresos:
Probabilistic Leverage Scores for Parallelized Unsupervised Feature Selection.
Año:2017
Áreas de investigación
  • Ciencias de la computación y tecnología informática
Datos
Descripción
Dimensionality reduction is often crucial for the application of machine learning and data mining. Feature selection methods can be employed for this purpose, with the advantage of preserving interpretability. There exist unsupervised feature selection methods based on matrix factorization algorithms, which can help choose the most informative features in terms of approximation error. Randomized methods have been proposed recently to provide better theoretical guarantees and better approximation errors than their deterministic counterparts, but their computational costs can be signiffcant when dealing with big, high dimensional data sets. Some existing randomized and deterministic approaches require the computation of the singular value decomposition in O(mn min(m; n)) time (for m samples and n features) for providing leverage scores. This compromises their applicability to domains of even moderately high dimensionality. In this paper we propose the use of Probabilistic PCA to compute the leverage scores in O(mnk) time, enabling the applicability of some of these randomized methods to large, highdimensional data sets. We show that using this approach, we can rapidly provide an approximation of the leverage scores that is works well in this context. In addition, we offer a parallelized version over the emerging Resilient Distributed Datasets paradigm (RDD) on Apache Spark, making it horizontally scalable for enormous numbers of data instances. We validate the performance of our approach on different data sets comprised of real-world and synthetic data.
Internacional
Si
Nombre congreso
International Work-Conference on Artificial Neural Networks
Tipo de participación
960
Lugar del congreso
Cadiz, España
Revisores
Si
ISBN o ISSN
978-3-319-59146-9
DOI
DOI: 10.1007/978-3-319-59147-6_61
Fecha inicio congreso
14/06/2017
Fecha fin congreso
16/06/2017
Desde la página
722
Hasta la página
733
Título de las actas
Advances in Computational Intelligence. IWANN 2017. Lecture Notes in Computer Science, vol 10306. Springer, Cham
Esta actividad pertenece a memorias de investigación
Participantes
  • Autor: Bruno Ordozgoiti Rubio (UPM)
  • Autor: Sandra Maria Gomez Canaval (UPM)
  • Autor: Bonifacio Alberto Mozo Velasco (UPM)
Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de Modelización Matemática y Biocomputación
  • Departamento: Sistemas Informáticos
S2i 2021 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)