Memorias de investigación
From traditional multi-stage learning to end-to-end deep learning for computer vision applications

Research Areas
  • Electronic technology and of the communications

The renaissance of Deep Neural Networks in the era of big data, along with the use of high- performance hardware that reduces computational time, have changed the paradigm of machine learning, specially in the field of computer vision. Whereas systems based on traditional machine learning rely on multiple stages and hand-crafted features to get the insight of the problem, Convolutional Neural Networks automatically learn the features that maximize the learning accuracy directly from raw images in an end-to-end manner. The purpose of this dissertation is to show the gap between traditional multi-stage learning systems and end-to-end deep learning systems, addressing different applications for a qualitative comparison. First, an expert-knowledge recognition system has been developed to deal with dynamic hand gestures. The key aspects of this system are hand-crafted image and video descriptors, and also the pipeline of the whole system. These descriptors have been designed to face difficulties of vision- based approaches such as illumination changes, intra-class and inter-class variances, and multiple scales. The design of the multiple stages of the system solve intermediate steps that are necessary to successfully apply the previous descriptors. Since the proposed hand-gesture recognition system has been designed for a human-computer interface, it comprises detection and tracking stages to localize the object of interest, and a recognition stage to categorize the performed gesture. Second, DL approaches have been proposed for different computer vision applications. Re- search efforts have focused on building these types of end-to-end systems to face the weaknesses present in traditional learning. Unlike previous approach, they do not need multiple stages to perform the target task, nor feature engineering. Their architecture designs rely on the task to be solved, its complexity, and the available amount of data. These guidelines have been applied to common vision-based applications such vehicle detection, and hand-gesture recognition, but also to more challenging situations, such as robotics applications.
Mark Rating
Sobresaliente cum laude

Research Group, Departaments and Institutes related
  • Creador: Grupo de Investigación: Grupo de Tratamiento de Imágenes (GTI)
  • Centro o Instituto I+D+i: Centro de I+d+i en Procesado de la Información y Telecomunicaciones
  • Departamento: Señales, Sistemas y Radiocomunicaciones