Observatorio de I+D+i UPM

| Otras actividades
HOME

Proyectos Internacionales Art�culos Patentes UPM Software UPM Empresas UPM Otras actividades Memorias de investigaci�n

Memorias de investigación

Ponencias en congresos:

Massively Parallel Unsupervised Feature Selection on Spark.

A�o:2015

�reas de investigaci�n

Ciencias de la computaci�n y tecnolog�a inform�tica

Datos

Descripci�n
High dimensional data sets pose important challenges such as the curse of dimensionality and increased computational costs. Dimensionality reduction is therefore a crucial step for most data mining applications. Feature selection techniques allow us to achieve said reduction. However, it is nowadays common to deal with huge data sets, and most existing feature selection algorithms are designed to function in a centralized fashion, which makes them non scalable. Moreover, some of them require the selection process to be validated according to some target, which constrains their applicability to the supervised learning setting. In this paper we propose as novelty a parallel, scalable, exact implementation of an existing centralized, unsupervised feature selection algorithm on Spark, an efficient big data framework for large-scale distributed computation that outperforms MapReduce when applied to multi-pass algorithms. We validate the efficiency of the implementation using 1GB of real Internet traffic captured at a medium-sized ISP.
Internacional	Si
Nombre congreso	19th East-European Conference on Advances in Databases and Information Systems (ADBIS). International Workshop on Big Data Applications and Principles (BigDap)
Tipo de participaci�n	960
Lugar del congreso	Poitiers, France
Revisores	Si
ISBN o ISSN	978-3-319-23201-0
DOI	10.1007/978-3-319-23201-0_21
Fecha inicio congreso	08/09/2015
Fecha fin congreso	11/09/2015
Desde la p�gina	186
Hasta la p�gina	196
T�tulo de las actas	New Trends in Databases and Information Systems. Communications in Computer and Information Science Series, Volume 539. (Funded by FP7 ONTIC project, no. 619633)

Esta actividad pertenece a memorias de investigaci�n

Participantes

Autor: Bruno Ordozgoiti Rubio UPM
Autor: Sandra Maria Gomez Canaval UPM
Autor: Bonifacio Alberto Mozo Velasco UPM

Grupos de investigaci�n, Departamentos, Centros e Institutos de I+D+i relacionados

Creador: Grupo de Investigaci�n: Internet de Nueva Generaci�n
Grupo de Investigaci�n: Grupo de Modelizaci�n Matem�tica y Biocomputaci�n
Departamento: Sistemas Inform�ticos