Memorias de investigación
Ponencias en congresos:
Probabilistic Leverage Scores for Parallelized Unsupervised Feature Selection.
Año:2017

Áreas de investigación
  • Ciencias de la computación y tecnología informática

Datos
Descripción
Dimensionality reduction is often crucial for the application of machine learning and data mining. Feature selection methods can be employed for this purpose, with the advantage of preserving interpretability. There exist unsupervised feature selection methods based on matrix factorization algorithms, which can help choose the most informative features in terms of approximation error. Randomized methods have been proposed recently to provide better theoretical guarantees and better approximation errors than their deterministic counterparts, but their computational costs can be signiffcant when dealing with big, high dimensional data sets. Some existing randomized and deterministic approaches require the computation of the singular value decomposition in O(mn min(m; n)) time (for m samples and n features) for providing leverage scores. This compromises their applicability to domains of even moderately high dimensionality. In this paper we propose the use of Probabilistic PCA to compute the leverage scores in O(mnk) time, enabling the applicability of some of these randomized methods to large, highdimensional data sets. We show that using this approach, we can rapidly provide an approximation of the leverage scores that is works well in this context. In addition, we offer a parallelized version over the emerging Resilient Distributed Datasets paradigm (RDD) on Apache Spark, making it horizontally scalable for enormous numbers of data instances. We validate the performance of our approach on different data sets comprised of real-world and synthetic data.
Internacional
Si
Nombre congreso
International Work-Conference on Artificial Neural Networks
Tipo de participación
960
Lugar del congreso
Cadiz, España
Revisores
Si
ISBN o ISSN
978-3-319-59146-9
DOI
DOI: 10.1007/978-3-319-59147-6_61
Fecha inicio congreso
14/06/2017
Fecha fin congreso
16/06/2017
Desde la página
722
Hasta la página
733
Título de las actas
Advances in Computational Intelligence. IWANN 2017. Lecture Notes in Computer Science, vol 10306. Springer, Cham

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de Modelización Matemática y Biocomputación
  • Departamento: Sistemas Informáticos