Observatorio de I+D+i UPM

Memorias de investigación
Ponencias en congresos:
Feature Ranking and Selection for Big Data Sets
Áreas de investigación
  • Ciencias de la computación y tecnología informática
The availability of big data sets has led to the successful application of machine learning and data mining to problems that were previously unsolved. The use of these techniques, though, is rarely straightforward. High dimensionality is often one of the main obstacles that must be overcome before learning an adequate model or drawing useful conclusions from large amounts of data. Rank revealing matrix factorizations can help in addressing this problem, by permuting the columns of the input data so that linearly dependent and thus redundant ones are moved to the right. These factorizations, however, are designed to operate in a centralized fashion, requiring the input data to be loaded into main memory, which makes them inapplicable to large data sets. In this paper we prove that data sets comprised of a huge number of rows can be easily transformed into a compact square matrix that preserves the permutation yielded by rank revealing QR factorizations. This leads to a simple algorithm for running these factorizations on big data sets regardless of their number of rows. The nature of the transformation makes it also possible to deal with high dimensional data with a controlled loss of precision. We offer experimental results showing that our method can provide improvements for the k-means algorithm, both in clustering results and in running time.
Nombre congreso
20th East-European Conference on Advances in Databases and Information Systems - ADBIS 2016 - Workshop: BigDap
Tipo de participación
Lugar del congreso
Praga, República Checa
Fecha inicio congreso
Fecha fin congreso
Desde la página
Hasta la página
Título de las actas
New Trends in Databases and Information Systems. Editorial: Springer. Volumen: 637
Esta actividad pertenece a memorias de investigación
  • Autor: Sandra Maria Gomez Canaval (UPM)
Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de Modelización Matemática y Biocomputación
  • Departamento: Sistemas Informáticos
S2i 2021 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)