Memorias de investigación
Ponencias en congresos:
Feature Ranking and Selection for Big Data Sets.
Año:2016

Áreas de investigación
  • Ciencias de la computación y tecnología informática

Datos
Descripción
The availability of big data sets has led to the successful application of machine learning and data mining to problems that were previously unsolved. The use of these techniques, though, is rarely straightforward. High dimensionality is often one of the main obstacles that must be overcome before learning an adequate model or drawing useful conclusions from large amounts of data. Rank revealing matrix factorizations can help in addressing this problem, by permuting the columns of the input data so that linearly dependent and thus redundant ones are moved to the right. These factorizations, however, are designed to operate in a centralized fashion, requiring the input data to be loaded into main memory, which makes them inapplicable to large data sets. In this paper we prove that data sets comprised of a huge number of rows can be easily transformed into a compact square matrix that preserves the permutation yielded by rank revealing QR factorizations. This leads to a simple algorithm for running these factorizations on big data sets regardless of their number of rows. The nature of the transformation makes it also possible to deal with high dimensional data with a controlled loss of precision. We offer experimental results showing that our method can provide improvements for the k-means algorithm, both in clustering results and in running time. (http://link.springer.com/chapter)
Internacional
Si
Nombre congreso
ADBIS 2016: New Trends in Databases and Information Systems - ADBIS 2016 Short Papers and Workshops, BigDap, DCSA, DC
Tipo de participación
960
Lugar del congreso
Prague, Czech Republic
Revisores
Si
ISBN o ISSN
978-3-319-44065-1
DOI
/10.1007%2F978-3-319-44066-8_14
Fecha inicio congreso
28/08/2016
Fecha fin congreso
31/08/2016
Desde la página
128
Hasta la página
136
Título de las actas
Proceedings ADBIS 2016: New Trends in Databases and Information Systems. Communications in Computer and Information Science 637, Springer 2016

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Internet de Nueva Generación
  • Departamento: Sistemas Informáticos