Observatorio de I+D+i UPM

Memorias de investigación
Ponencias en congresos:
Spark versus Flink: Understanding Performance in Big Data Analytics Frameworks
Año:2016
Áreas de investigación
  • Ciencias de la computación y tecnología informática,
  • Sistema informático
Datos
Descripción
Big Data analytics has recently gained increasing popularity as a tool to process large amounts of data on-demand. Spark and Flink are two Apache-hosted data analytics frameworks that facilitate the development of multi-step data pipelines using directly acyclic graph patterns. Making the most out of these frameworks is challenging because efficient executions strongly rely on complex parameter configurations and on an in-depth understanding of the underlying architectural choices. Although extensive research has been devoted to improving and evaluating the performance of such analytics frameworks, most of them benchmark the platforms against Hadoop, as a baseline, a rather unfair comparison considering the fundamentally different design principles. This paper aims to bring some justice in this respect, by directly evaluating the performance of Spark and Flink. Our goal is to identify and explain the impact of the different architectural choices and the parameter configurations on the perceived end-to-end performance. To this end, we develop a methodology for correlating the parameter settings and the operators execution plan with the resource usage. We use this methodology to dissect the performance of Spark and Flink with several representative batch and iterative workloads on up to 100 nodes. Our key finding is that there none of the two framework outperforms the other for all data types, sizes and job patterns. This paper performs a fine characterization of the cases when each framework is superior, and we highlight how this performance correlates to operators, to resource usage and to the specifics of the internal framework design.
Internacional
No
Nombre congreso
IEEE Cluster 2016
Tipo de participación
960
Lugar del congreso
Taipei, Taiwan
Revisores
Si
ISBN o ISSN
2168-9253
DOI
10.1109/CLUSTER.2016.22
Fecha inicio congreso
12/09/2016
Fecha fin congreso
16/09/2016
Desde la página
433
Hasta la página
442
Título de las actas
IEEE Cluster 2016
Esta actividad pertenece a memorias de investigación
Participantes
  • Autor: Maria de los Santos Perez Hernandez (UPM)
Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Ontology Engineering Group
  • Departamento: Arquitectura y Tecnología de Sistemas Informáticos
S2i 2021 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)