Observatorio de I+D+i UPM

Memorias de investigación
Ponencias en congresos:
Feature Ranking and Selection for Big Data Sets
Año:2016
Áreas de investigación
  • Ciencias de la computación y tecnología informática
Datos
Descripción
The availability of big data sets has led to the successful application of machine learning and data mining to problems that were previously unsolved. The use of these techniques, though, is rarely straightforward. High dimensionality is often one of the main obstacles that must be overcome before learning an adequate model or drawing useful conclusions from large amounts of data. Rank revealing matrix factorizations can help in addressing this problem, by permuting the columns of the input data so that linearly dependent and thus redundant ones are moved to the right. These factorizations, however, are designed to operate in a centralized fashion, requiring the input data to be loaded into main memory, which makes them inapplicable to large data sets. In this paper we prove that data sets comprised of a huge number of rows can be easily transformed into a compact square matrix that preserves the permutation yielded by rank revealing QR factorizations. This leads to a simple algorithm for running these factorizations on big data sets regardless of their number of rows. The nature of the transformation makes it also possible to deal with high dimensional data with a controlled loss of precision. We offer experimental results showing that our method can provide improvements for the k-means algorithm, both in clustering results and in running time.
Internacional
Si
Nombre congreso
20th East-European Conference on Advances in Databases and Information Systems - ADBIS 2016 - Workshop: BigDap
Tipo de participación
960
Lugar del congreso
Praga, República Checa
Revisores
Si
ISBN o ISSN
978-3-319-44065-1
DOI
10.1007/978-3-319-44066-8_14
Fecha inicio congreso
28/08/2016
Fecha fin congreso
31/08/2016
Desde la página
128
Hasta la página
136
Título de las actas
New Trends in Databases and Information Systems. Editorial: Springer. Volumen: 637
Esta actividad pertenece a memorias de investigación
Participantes
  • Autor: Sandra Maria Gomez Canaval (UPM)
Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de Modelización Matemática y Biocomputación
  • Departamento: Sistemas Informáticos
S2i 2021 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)