Observatorio de I+D+i UPM

Memorias de investigación
Artículos en revistas:
Using machine learning to optimize parallelism in big data applications
Año:2017
Áreas de investigación
  • Ciencias de la computación y tecnología informática
Datos
Descripción
In-memory cluster computing platforms have gained momentum in the last years, due to their ability to analyse big amounts of data in parallel. These platforms are complex and difficult-to-manage environments. In addition, there is a lack of tools to better understand and optimize such platforms that consequently form the backbone of big data infrastructure and technologies. This directly leads to underutilization of available resources and application failures in such environment. One of the key aspects that can address this problem is optimization of the task parallelism of application in such environments. In this paper, we propose a machine learning based method that recommends optimal parameters for task parallelization in big data workloads. By monitoring and gathering metrics at system and application level, we are able to find statistical correlations that allow us to characterize and predict the effect of different parallelism settings on performance. These predictions are used to recommend an optimal configuration to users before launching their workloads in the cluster, avoiding possible failures, performance degradation and wastage of resources. We evaluate our method with a benchmark of 15 Spark applications on the Grid5000 testbed. We observe up to a 51% gain on performance when using the recommended parallelism settings. The model is also interpretable and can give insights to the user into how different metrics and parameters affect the performance.
Internacional
Si
JCR del ISI
Si
Título de la revista
Future Generation Computer Systems
ISSN
0167-739X
Factor de impacto JCR
3.997
Información de impacto
Volumen
DOI
10.1016/j.future.2017.07.003.
Número de revista
Desde la página
1
Hasta la página
17
Mes
JULIO
Ranking
Esta actividad pertenece a memorias de investigación
Participantes
  • Autor: Alvaro Brandon Hernandez (UPM)
  • Autor: Maria de los Santos Perez Hernandez (UPM)
  • Autor: Smrati Gupta
  • Autor: Víctor Muntés-Mulero
Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Ontology Engineering Group
  • Departamento: Arquitectura y Tecnología de Sistemas Informáticos
S2i 2021 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)