Memorias de investigación
Artículos en revistas:
"A Big Data Platform for Large Scale Event Processing".
Año:2012

Áreas de investigación
  • Ciencias de la computación y tecnología informática

Datos
Descripción
In many emerging applications, the volume of data being streamed is so large that the traditional ?store-then-process? paradigm is either not suitable or too inefficient. Moreover, soft-real time requirements might severely limit the engineering solutions. Many scenarios fit this description. In network security for cloud data centers very high volumes of IP packets and events from sensors at firewalls, network switches and routers, servers, etc. needs to be analyzed and should detect attacks in minimal time, in order to limit the effect of the malicious activity over the IT infrastructure. In the fraud department of a credit card company payment requests should be processed in an online manner. Payment requests need to be processed as quickly as possible in order to provide meaningful results inreal-time; an ideal solution would detect fraud during the authorization process that last 100s of milliseconds and deny the payment authorization, minimizing the damage to the user and the credit card company. In this context, researchers have proposed a new computing paradigm called Complex Event Processing. A complex event processor (CEP) is a system designed to process continuous streams of data in near real-time. Data flow in streams that are not stored but are rather processed on-the-fly. Similar to DBMS, a CEP processes queries over tuples. However, while in the context of DMBS the set of tuples to be processed is fairly-static, CEP deals with an infinite sequence of events. Data processing is performed through continuous queries based on the sliding window model. This approach differs from queries in traditional DBMS because a continuous query is constantly ?standing? over the streaming events and results are output any time the actual data satisfies the query predicate. A continuous query is modeled as a graph where edges identify data flows and nodes represent operators that process input data. Centralized CEPs suffered from single nodes bottleneck and were quickly replaced by distributed CEPs where the query was distributed across several nodes, in order to decrease the per-node tuple processing time and increase the overall throughput. Nevertheless, each node of a distributed CEP must process the whole input flow, what severely limits scalability and application scope. The real research challenge is how to build a parallel-distributed CEP where data is partitioned across processing nodes that (i) does not require any node to process the whole input and (ii) provides the same results of an ideal centralized execution (i.e., without any delay due to input tuples queuing up). The gist of the problem is how to distribute input tuples, so that tuples that must be aggregated or joined together are actually received by the same processing node. With those goals in mind, we are developing StreamCloud [1,2], a parallel-distributed and elastic CEP that delivers unmatched performance in terms of throughput and allows for cost-effective resource utilization. The StreamCloud project is carried out by the Distributed System Lab. at Universidad Politecnica de Madrid in collaboration with the Zenith team at INRIA and LIRMM, Montpellier. The system is being exercised for a Security Information and Event Management system in the MASSIF project. In this paper we describe StreamClous architecute and evaluate its perfomance.
Internacional
Si
JCR del ISI
No
Título de la revista
ERCIM NEWS Special theme Big Data
ISSN
0926-4981
Factor de impacto JCR
Información de impacto
Volumen
89
DOI
Número de revista
Desde la página
32
Hasta la página
33
Mes
ABRIL
Ranking

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Laboratorio de sistemas distribuidos (LSD)