The 4th International Workshop on Data Reduction for Big Scientific Data (DRBSD-4)

Abstract

A growing disparity between simulation speeds and I/O rates makes it increasingly infeasible for high-performance applications to save all results for offline analysis. By 2024, computers are expected to compute at 1018 ops/sec but write to disk only at 1012 bytes/sec: a compute-to-output ratio 200 times worse than on the first petascale systems. In this new world, applications must increasingly perform online data analysis and reduction—tasks that introduce algorithmic, implementation, and programming model challenges that are unfamiliar to many scientists and that have major implications for the design of various elements of exascale systems.

This trend has spurred interest in high-performance online data analysis and reduction methods, motivated by a desire to conserve I/O bandwidth, storage, and/or power; increase accuracy of data analysis results; and/or make optimal use of parallel platforms, among other factors. This requires our community to understand a clear yet complex relationships between application design, data analysis and reduction methods, programming models, system software, hardware, and other elements of a next-generation High Performance Computer, particularly given constraints such as applicability, fidelity, performance portability, and power efficiency.

There are at least three important topics that our community is striving to answer: (1) whether several orders of magnitude of data reduction is possible for exascale sciences; (2) understanding the performance and accuracy trade-off of data reduction; and (3) solutions to effectively reduce data while preserving the information hidden in large scientific data. Tackling these challenges requires expertise from computer science, mathematics, and application domains to study the problem holistically, and develop solutions and hardened software tools that can be used by production applications.

The goal of this workshop is to provide a focused venue for researchers in all aspects of data reduction and analysis to present their research results, exchange ideas, identify new research directions, and foster new collaborations within the community.

Topics of interest include but are not limited to:

• Application use-cases which can drive the community to develop MiniApps

• Data reduction methods for scientific data including:

Data deduplication methods
Motif-specific methods (structured and unstructured meshes, particles, tensors, …)
Optimal design of data reduction methods
Methods with accuracy guarantees

• Metrics to measure reduction quality and provide feedback

• Data analysis and visualization techniques that take advantage of the reduced data

• Hardware and data co-design

• Accuracy and performance trade-offs on current and emerging hardware

• New programming models for managing reduced data

• Runtime systems for data reduction

Workshop Program

2:00pm - 2:09pm Introduction - The 4th International Workshop on Data Reduction for Big Scientific Data (DRBSD-4)

Scott Klasky, Qing Liu, Ian FosterMark, Ainsworth

2:09pm - 3:09pm Perspectives on Data Reduction from ASCR

Laura Biven, Lucy Nowell

3:09pm - 3:24pm DRBSD-4 – Workshop Afternoon Break

3:24pm - 3:40pm Feature-Relevant Data Reduction for In Situ Workflows

Will Fox, Matthew Wolf, Jeremy Logan, Jong Youl Choi, Scott Klasky, Tahsin Kurc

3:40pm - 3:56pm A Statistical Analysis of Compressed Climate Model Data

Andrew Poppick, Joseph Nardi, Noah Feldman, Allison Baker, Dorit Hammerling

3:56pm - 4:26pm Data Reduction Challenges in Coordinated Simulation and Experimental Fusion Science

Sean Dettrick

4:26pm - 4:42pm Exploring Best Lossy Compression Strategy By Combining SZ with Spatiotemporal Decimation

Xin Liang, Sheng Di, Sihuan Li, Dingwen Tao, Zizhong Chen, Franck Cappello

4:42pm - 4:58pm Synthetic Data Generation for Evaluating Parallel I/O Compression Performance and Scalability

Sean B. Ziegeler, Christopher P. Stone

4:58pm - 5:14pm Amplitude-Aware Lossy Compression for Quantum Circuit Simulation

Xin-Chuan Wu, Sheng Di, Franck Cappello, Hal Finkel, Yuri Alexeev, Frederic T. Chong

5:14pm - 5:30pm A Study on Checkpoints Compression for Adjoint Computation

Kai-Yuan Hou, Sri Hari Krishna Narayanan, Daniel Goldberg, Navjot Kukreja, Bogdan Nicolae, Paul Hovland

Organizing Committee

Scott Klasky, Oak Ridge National Laboratory

Gary Liu, New Jersey Institute of Technology

Mark Ainsworth, Brown University/Oak Ridge National Laboratory

Ian Foster, Argonne National Laboratory/University of Chicago

Web Chair

Huizhang Luo, New Jersey Institute of Technology

Technical Program Committee

Frank Cappello, Argonne National Laboratory

Peter Lindstrom, Lawrence Livermore National Laboratory

Todd Munson, Argonne National Laboratory

Kerstin Van Dam, Brookhaven National Laboratory

George Ostrouchov, Oak Ridge National Laboratory

Scott Klasky, Oak Ridge National Laboratory

Mark Ainsworth, Brown University/Oak Ridge National Laboratory

John Wu, Lawrence Berkeley National Laboratory

Todd Munson, Argonne National Laboratory

Eric Suchyta, Oak Ridge National Laboratory

Martin Burtscher, Texas State University

Haihang You, Institute of Computing Technology, Chinese Academy of Sciences

Call for Papers

The 4th International Workshop on Data Reduction for Big Scientific Data (DRBSD-4)

in Conjunction with SC’18

Nov 11th, 2018

Dallas, TX

As the speed gap between compute and storage continues to exist and widen, the increasing data volume and velocity pose major challenges for big data applications in terms of storage and analysis. This demands new research and software tools that can further reduce data by several orders of magnitude, taking advantage of new architectures and hardware available on next generation systems. This international workshop on data reduction is a response to this renewed research direction and will provide a focused venue for researchers in this area to present their research results, exchange ideas, identify new research directions, and foster new collaborations within the community.

Topics of interest include but are not limited to:

• Application use-cases which can drive the community to develop MiniApps

• Data reduction methods for scientific data including:

Data deduplication methods
Motif-specific methods (structured and unstructured meshes, particles, tensors, …)
Optimal design of data reduction methods
Methods with accuracy guarantees

• Metrics to measure reduction quality and provide feedback

• Data analysis and visualization techniques that take advantage of the reduced data

• Hardware and data co-design

• Accuracy and performance trade-offs on current and emerging hardware

• New programming models for managing reduced data

• Runtime systems for data reduction

Important Dates

Paper Deadline: extended to Oct 5th, 2018 (AoE)

Author Notification: by Oct 15th, 2018

Submissions

Papers should be submitted electronically on SC Submission Website.

• Paper submission must be in IEEE format.

http://www.ieee.org/conferences_events/conferences/publishing/templates.html

• Paper submissions are required to be within 5 pages excluding references.

Submitted papers will be evaluated by at least 3 reviewers based upon technical merits.