Toward a Unified HPC and Big Data Runtime

Published in STREAM, 2015

The landscape of high performance computing (HPC) has radically changed over the past decade as the community has well surpassed Petascale performance and aims for Exascale. In this effort, chip fabrication and hardware architects have been directly challenged by the fundamentals of physics of chip manufacturing. The effects of these challenges have extended beyond the underlying hardware requiring the attention of the entire stack. As the fight for raw performance continues, a new field in computing has emerged. Big Data Analytics has been heralded as the fourth paradigm of science by turning enormous volumes of data into actionable knowledge. Big Data’s influence spans various interests including, commercial, political, and scientific fields.

While HPC and Big Data seem to approach knowledge discovery from two disparate angles, the technical challenges they face place them on a converging path. Both are required to address scalability, data movement, energy efficiency, and resiliency in large computing systems. Furthermore, in future systems, the copious parallelism will be capable of overwhelming the I/O. This makes the previous methodology of performing scientific simulations and then analyzing the results post mortem unreasonable, giving rise to in-situ analytic techniques.

From a HPC perspective, fine-grain event driven execution models have been proposed as a flexible and efficient model to utilize the underlying hardware. We propose extending a fine-grain execution model to support Big Data techniques as a method to efficiently utilize Exascale resources and as means to join scientific simulation and its analysis overcoming hardware limits (such as limited I/O bandwidth) and reducing the overall time to knowledge discovery.