Scalable and Efficient Data Management and Analysis for Exascale Computing

Dr. Gary Liu
ECE Department, NJIT


Abstract

With increasing fidelity and resolution, scientific applications at Exascale will generate large volumes of data. These data need to be stored, pre-processed, analyzed, and visualized very efficiently, so that the time to gain insights from data can be minimized. Conventional data management strategies are simplistic, and can result in huge performance bottlenecks at Exascale for both data storage and analysis. In this talk, I will discuss scalable and efficient data management strategies that can reduce these bottlenecks for applications running at scale (e.g., 100,000-core). I will present new techniques that reduce I/O interference in a massively parallel and multi-user environment, without forcing operating system or application level changes. I will then present PreData, a new paradigm that can couple simulations and analytics more efficiently by processing data in-memory and in a streaming fashion. In the end I will briefly introduce my work that combines phase identification and statistical modeling to generate compact and high-fidelity benchmarks for performance evaluations on new HPC systems.