Novel Storage Architectures for Big Data Challenges in Extreme-scale HPC Systems

Dr. Sudharshan S Vazhkudai
Oak Ridge National Laboratory


Abstract

Massively parallel scientific applications, running on extreme-scale supercomputers (e.g., Jaguar), produce hundreds of terabytes of result and checkpoint snapshot data per run, driving the need for storage solutions to improve their I/O performance. Traditional parallel file systems (PFS) in high performance computing (HPC) centers are unable to keep up with such high data rates, creating a “storage wall.” In future multipetaflop and exaflop systems, the many-cores and the data they can produce will expose the storage performance gap dramatically. In this talk, I will present a novel multi-tiered storage architecture comprising of hybrid node-local resources to construct a dynamic, aggregate data store for extreme-scale machines. The solution relies on the concerted use of diverse storage resources (e.g., DRAM, SSD) in such a way as to approach the performance of the fastest component technology and the cost of the least expensive one. In addition to serving as a performance impedance matching device between applications and the PFS, it also doubles as a staging area wherein data analytics can be conducted in-situ, to reduce and triage the massive amounts of data. Along the way, I will highlight how this notion of storage aggregation can help alleviate big data challenges not only in supercomputers, but also in desktop grid computing and mid-size cluster environments.