Descripción
|
|
---|---|
Today, data are available at an unprecedented scale. An overwhelming quantity of Internet-connected devices generate a constant trickle of pieces of in formation all over the world, much of which are processed in real time or stored for later use. Making sense of these enormous data sets is often an challenging endeav our. Their size demands the use of massive computational resources, which motivates the design of e?cient algorithms. Additionally, these data usually contain measurements of a large number of variables, which poses a wide variety of problems. To address the latter, a family of techniques commonly referred to as dimensionality reduction is studied. In this thesis we address the problem of feature selection, a subset of dimensionality reduction methods that preserve the semantic meaning of the original data variables. To do so, we analyze a problem formulation known as column subset selection. A signi?cant advantage of column subset selection is that the models it produces are simple and in some cases easy to interpret. In an age where notable advances in applied computer science are met with growing concerns about ethics and transparency, model simplicity can become a key requirement in many scenarios. The column subset selection problem has received signi?cant attention in the computer science literature over the last few years, mainly from a theoretical perspective. Here we analyze the problem from a more practical standpoint. Our contributions can be summarized as follows. First, we propose the use of a local search heuristic. We show empirically that it outperforms existing algorithms and derive elementary approximation guarantees. Furthermore, we take advantage of the nature of the problem formulation to derive an e?cient implementation suitable for practical use. Second, we introduce regularized formulations of the problem. We derive a greedy algorithm for these new objectives and demonstrate empirically that it produces improved subsets with respect to multiple criteria. | |
Internacional
|
Si |
ISBN
|
|
Tipo de Tesis
|
Doctoral |
Calificación
|
Sobresaliente cum laude |
Fecha
|
22/11/2018 |