Memorias de investigación
Tesis:
Column subset selection in practice: e?cient heuristics and regularization
Año:2018

Áreas de investigación
  • Ciencias de la computación y tecnología informática

Datos
Descripción
Today, data are available at an unprecedented scale. An overwhelming quantity of Internet-connected devices generate a constant trickle of pieces of in formation all over the world, much of which are processed in real time or stored for later use. Making sense of these enormous data sets is often an challenging endeav our. Their size demands the use of massive computational resources, which motivates the design of e?cient algorithms. Additionally, these data usually contain measurements of a large number of variables, which poses a wide variety of problems. To address the latter, a family of techniques commonly referred to as dimensionality reduction is studied. In this thesis we address the problem of feature selection, a subset of dimensionality reduction methods that preserve the semantic meaning of the original data variables. To do so, we analyze a problem formulation known as column subset selection. A signi?cant advantage of column subset selection is that the models it produces are simple and in some cases easy to interpret. In an age where notable advances in applied computer science are met with growing concerns about ethics and transparency, model simplicity can become a key requirement in many scenarios. The column subset selection problem has received signi?cant attention in the computer science literature over the last few years, mainly from a theoretical perspective. Here we analyze the problem from a more practical standpoint. Our contributions can be summarized as follows. First, we propose the use of a local search heuristic. We show empirically that it outperforms existing algorithms and derive elementary approximation guarantees. Furthermore, we take advantage of the nature of the problem formulation to derive an e?cient implementation suitable for practical use. Second, we introduce regularized formulations of the problem. We derive a greedy algorithm for these new objectives and demonstrate empirically that it produces improved subsets with respect to multiple criteria.
Internacional
Si
ISBN
Tipo de Tesis
Doctoral
Calificación
Sobresaliente cum laude
Fecha
22/11/2018

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de Modelización Matemática y Biocomputación
  • Departamento: Matemática Aplicada a Las Tecnologías de la Información y Las Comunicaciones
  • Departamento: Sistemas Informáticos