Enabling Big-data Scientific Workflows in High-performance Networks

Chase Qishi Wu, Associate Professor
University of Memphis


Abstract

Next-generation e-science is producing colossal amounts of data, now frequently termed as “big data”, on the order of terabyte at present and petabyte or even exabyte in the predictable future. These scientific applications typically feature data- and network-intensive workflows comprised of computing modules with intricate inter-module dependencies. Application users oftentimes need to manually configure their computing workflows in distributed environments in an ad-hoc manner, which significantly limits the productivity of scientists and constrains the utilization of resources. Our research is focused on the development of an integrated and automated workflow solution to enable extreme-scale scientific computations in high-performance networks. Together with science collaborators at national laboratories within U.S. Department of Energy, we design a three-layer workflow architecture where the workflow performance is optimized through the co-scheduling of computing and networking resources based on resource abstraction, bandwidth reservation, and workflow mapping. This talk provides a brief tutorial on big-data scientific applications and shares our research results on various enabling technologies based on rigorous algorithm design, theoretical dynamics analysis, and real network implementation, deployment, and evaluation.