Abstract
A growing disparity between simulation speeds and I/O rates makes it increasingly infeasible for high-performance applications to save all results for offline analysis. By 2024, computers are expected to compute at 1018 ops/sec but write to disk only at 1012 bytes/sec: a compute-to-output ratio 200 times worse than on the first petascale systems. In this new world, applications must increasingly perform online data analysis and reduction—tasks that introduce algorithmic, implementation, and programming model challenges that are unfamiliar to many scientists and that have major implications for the design of various elements of exascale systems.
This trend has spurred interest in high-performance online data analysis and reduction methods, motivated by a desire to conserve I/O bandwidth, storage, and/or power; increase accuracy of data analysis results; and/or make optimal use of parallel platforms, among other factors. This requires our community to understand a clear yet complex relationships between application design, data analysis and reduction methods, programming models, system software, hardware, and other elements of a next-generation High Performance Computer, particularly given constraints such as applicability, fidelity, performance portability, and power efficiency.
There are at least three important topics that our community is striving to answer: (1) whether several orders of magnitude of data reduction is possible for exascale sciences; (2) understanding the performance and accuracy trade-off of data reduction; and (3) solutions to effectively reduce data while preserving the information hidden in large scientific data. Tackling these challenges requires expertise from computer science, mathematics, and application domains to study the problem holistically, and develop solutions and hardened software tools that can be used by production applications.
The goal of this workshop is to provide a focused venue for researchers in all aspects of data reduction and analysis to present their research results, exchange ideas, identify new research directions, and foster new collaborations within the community.
Topics of interest include but are not limited to:
• (New) AI and Data analysis over extreme-scale scientific datasets
• (New) Large-scale code coupling and workflow
• (New) Compressed sensing
• Application use-cases which can drive the community to develop MiniApps
• Data reduction methods for scientific data including:
o Data deduplication methods
o Motif-specific methods (structured and unstructured meshes, particles, tensors, …)
o Optimal design of data reduction methods
o Methods with accuracy guarantees
• Metrics to measure reduction quality and provide feedback
• Data analysis and visualization techniques that take advantage of the reduced data
• Hardware and data co-design
• Accuracy and performance trade-offs on current and emerging hardware
• New programming models for managing reduced data
• Runtime systems for data reduction
Technical Program
10:01am - 10:10am: DRBSD-6 – Welcome and Introduction
Presenter: Scott Klasky
10:10am - 10:45am: Streaming Data – The Transformation of HPC Systems into Discovery Machines
Presenter: Michael Bussmann
10:45am - 11:20am: The Square Kilometre Array and exa-scale challenges for future astronomy facilities
Presenter: Peter Quinn
11:20am - 11:55am: Toward a Framework for Policy-Driven Adaptive In Situ Workflows
Presenter: Kshitij Mehta
11:55am - 12:30pm: Combining Spatial and Temporal Properties for Improvements in Data Reduction
Presenter: Megan L. Hickman Fulp
12:30pm - 12:50pm: Break
12:50pm - 1:25pm: Invited Talk: Sanjay Ranka
Presenter: Sanjay Ranka
1:25pm - 2:00pm: Dynamic, Adaptive Resource Management for Scientific Workflows
Presenter: Alan Sussman
2:00pm - 2:30pm: Break
2:30pm - 3:05pm: AI for Science: Some Big Data Challenges
Presenter: Rick Stevens
3:05pm - 3:40pm: Intelligent Data Management for Extreme-Scales In-Situ Workflows
Presenter: Manish Parashar
3:40pm - 4:15pm: Data Compression with Deep Learning Based Generative Modeling
Presenter: Jong Youl Choi
4:15pm - 4:45pm: Break
4:45pm - 5:20pm: Data Analytics for Scientific Data Compression
Presenter: Richard Archibald
5:20pm - 5:55pm: Machine learning for science with a deadline: a focus on the scientist
Presenter: Michael Churchill
5:55pm - 6:29pm: A Survey of Resource Constrained Scheduling for In Situ Analysis
Presenter: Todd Munson
6:29pm - 6:30pm: DRBSD-6 – Closing Remarks
Presenter: Scott Klasky
Organizing Committee
Scott Klasky, Oak Ridge National Laboratory
Gary Liu, New Jersey Institute of Technology
Mark Ainsworth, Brown University
Ian Foster, Argonne National Laboratory/University of Chicago
Technical Program Committee (tentative)
Frank Cappello, Argonne National Laboratory
Peter Lindstrom, Lawrence Livermore National Laboratory
Todd Munson, Argonne National Laboratory
John Wu, Lawrence Berkeley National Laboratory
Todd Munson, Argonne National Laboratory
Martin Burtscher, Texas State University
Dan Huang, Sun Yat-sen University
Haihang You, Institute of Computing Technology, Chinese Academy of Sciences
Xubin He, Temple University
Dingwen Tao, Washington State University
Xin Liang, Oak Ridge National Laboratory
Ben Whitney, Oak Ridge National Laboratory
Sheng Di, Argonne National Laboratory
Allison Baker, NCAR,
Dorit M. Hammerling, Colorado School of Mines
Call for Papers
The 6th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-6)
Held in conjunction with SC20: The International Conference for High Performance Computing, Networking, Storage and Analysis
Nov 12th, 2020
Atlanta, GA
A growing disparity between simulation speeds and I/O rates makes it increasingly infeasible for high-performance applications to save all results for offline analysis. By 2024, computers are expected to compute at 1018 ops/sec but write to disk only at 1012 bytes/sec: a compute-to-output ratio 200 times worse than on the first petascale systems. In this new world, applications must increasingly perform online data analysis and reduction—tasks that introduce algorithmic, implementation, and programming model challenges that are unfamiliar to many scientists and that have major implications for the design of various elements of exascale systems.
This trend has spurred interest in high-performance online data analysis and reduction methods, motivated by a desire to conserve I/O bandwidth, storage, and/or power; increase accuracy of data analysis results; and/or make optimal use of parallel platforms, among other factors. This requires our community to understand a clear yet complex relationships between application design, data analysis and reduction methods, programming models, system software, hardware, and other elements of a next-generation High Performance Computer, particularly given constraints such as applicability, fidelity, performance portability, and power efficiency.
Topics of interest include but are not limited to:
• (New) AI and Data analysis over extreme-scale scientific datasets
• (New) Large-scale code coupling and workflow
• (New) Compressed sensing
• Application use-cases which can drive the community to develop MiniApps
• Data reduction methods for scientific data including:
o Data deduplication methods
o Motif-specific methods (structured and unstructured meshes, particles, tensors, …)
o Optimal design of data reduction methods
o Methods with accuracy guarantees
• Metrics to measure reduction quality and provide feedback
• Data analysis and visualization techniques that take advantage of the reduced data
• Hardware and data co-design
• Accuracy and performance trade-offs on current and emerging hardware
• New programming models for managing reduced data
• Runtime systems for data reduction
Important Dates
Paper Deadline: (New) Extended to September 30th, 2020 (AoE)
Author Notification: (New) by Oct 12th, 2020
Submissions
Papers should be submitted electronically on SC Submission Website.
• Paper submission must be in IEEE format.
http://www.ieee.org/conferences_events/conferences/publishing/templates.html
•Papers should be submitted electronically on SC Submission Website.
* Paper submission must be in IEEE format.
http://www.ieee.org/conferences_events/conferences/publishing/templates.html
* DRBSD-6 will accept full papers (no more than 6 pages, except references and appendix), and extended abstracts (2 pages, except references and appendix).
* Submitted papers will be evaluated by at least 3 reviewers based upon technical merits.