Observatorio de I+D+i UPM

Memorias de investigación
Capítulo de libro:
Massively Parallel Unsupervised Feature Selection on Spark
Áreas de investigación
  • Ciencias de la computación y tecnología informática,
  • Telemática
High dimensional data sets pose important challenges such as the curse of dimensionality and increased computational costs. Di- mensionality reduction is therefore a crucial step for most data mining applications. Feature selection techniques allow us to achieve said reduction. However, it is nowadays common to deal with huge data sets, and most existing feature selection algorithms are designed to function in a centralized fashion, which makes them non scalable. Moreover, some of them require the selection process to be validated according to some target, which constrains their applicability to the supervised learning setting. In this paper we propose as novelty a parallel, scalable, exact implementation of an existing centralized, unsupervised feature selection algorithm on Spark, an efficient big data framework for large-scale distributed computation that outperforms MapReduce when applied to multi-pass algorithms. We validate the efficiency of the implementation using 1GB of real Internet traffic captured at a medium-sized ISP.
Edición del Libro
Editorial del Libro
Communications in Computer and Information Science
Título del Libro
New Trends in Databases and Information Systems
Desde página
Hasta página
Esta actividad pertenece a memorias de investigación
  • Autor: Bruno Ordozgoiti Rubio (UPM)
  • Autor: Sandra Maria Gomez Canaval (UPM)
  • Autor: Bonifacio Alberto Mozo Velasco (UPM)
Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Internet de Nueva Generación
S2i 2021 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)