Observatorio de I+D+i UPM

| Otras actividades
HOME

Proyectos Internacionales Art�culos Patentes UPM Software UPM Empresas UPM Otras actividades Memorias de investigaci�n

Memorias de investigación

Tesis:

Column subset selection in practice: e?cient heuristics and regularization

A�o:2018

�reas de investigaci�n

Ciencias de la computaci�n y tecnolog�a inform�tica

Datos

Descripci�n
Today, data are available at an unprecedented scale. An overwhelming quantity of Internet-connected devices generate a constant trickle of pieces of in formation all over the world, much of which are processed in real time or stored for later use. Making sense of these enormous data sets is often an challenging endeav our. Their size demands the use of massive computational resources, which motivates the design of e?cient algorithms. Additionally, these data usually contain measurements of a large number of variables, which poses a wide variety of problems. To address the latter, a family of techniques commonly referred to as dimensionality reduction is studied. In this thesis we address the problem of feature selection, a subset of dimensionality reduction methods that preserve the semantic meaning of the original data variables. To do so, we analyze a problem formulation known as column subset selection. A signi?cant advantage of column subset selection is that the models it produces are simple and in some cases easy to interpret. In an age where notable advances in applied computer science are met with growing concerns about ethics and transparency, model simplicity can become a key requirement in many scenarios. The column subset selection problem has received signi?cant attention in the computer science literature over the last few years, mainly from a theoretical perspective. Here we analyze the problem from a more practical standpoint. Our contributions can be summarized as follows. First, we propose the use of a local search heuristic. We show empirically that it outperforms existing algorithms and derive elementary approximation guarantees. Furthermore, we take advantage of the nature of the problem formulation to derive an e?cient implementation suitable for practical use. Second, we introduce regularized formulations of the problem. We derive a greedy algorithm for these new objectives and demonstrate empirically that it produces improved subsets with respect to multiple criteria.
Internacional	Si
ISBN
Tipo de Tesis	Doctoral
Calificaci�n	Sobresaliente cum laude
Fecha	22/11/2018

Esta actividad pertenece a memorias de investigaci�n

Participantes

Autor: Bruno Ordozgoiti Rubio Miembro del grupo como becario OTT
Director: Jesus Garcia Lopez de Lacalle UPM
Director: Bonifacio Alberto Mozo Velasco UPM

Grupos de investigaci�n, Departamentos, Centros e Institutos de I+D+i relacionados

Creador: Grupo de Investigaci�n: Grupo de Modelizaci�n Matem�tica y Biocomputaci�n
Departamento: Matem�tica Aplicada a Las Tecnolog�as de la Informaci�n y Las Comunicaciones
Departamento: Sistemas Inform�ticos