The 5th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-5)

In cooperation with

Held in conjunction with SC19:

The International Conference for High Performance Computing, Networking, Storage and Analysis

 

Abstract

A growing disparity between simulation speeds and I/O rates makes it increasingly infeasible for high-performance applications to save all results for offline analysis. By 2024, computers are expected to compute at 1018 ops/sec but write to disk only at 1012 bytes/sec: a compute-to-output ratio 200 times worse than on the first petascale systems. In this new world, applications must increasingly perform online data analysis and reduction—tasks that introduce algorithmic, implementation, and programming model challenges that are unfamiliar to many scientists and that have major implications for the design of various elements of exascale systems.

This trend has spurred interest in high-performance online data analysis and reduction methods, motivated by a desire to conserve I/O bandwidth, storage, and/or power; increase accuracy of data analysis results; and/or make optimal use of parallel platforms, among other factors.  This requires our community to understand a clear yet complex relationships between application design, data analysis and reduction methods, programming models, system software, hardware, and other elements of a next-generation High Performance Computer, particularly given constraints such as applicability, fidelity, performance portability, and power efficiency.

There are at least three important topics that our community is striving to answer: (1) whether several orders of magnitude of data reduction is possible for exascale sciences; (2) understanding the performance and accuracy trade-off of data reduction; and (3) solutions to effectively reduce data while preserving the information hidden in large scientific data.  Tackling these challenges requires expertise from computer science, mathematics, and application domains to study the problem holistically, and develop solutions and hardened software tools that can be used by production applications.

The goal of this workshop is to provide a focused venue for researchers in all aspects of data reduction and analysis to present their research results, exchange ideas, identify new research directions, and foster new collaborations within the community.

Topics of interest include but are not limited to:

• (New) AI and Data analysis over extreme-scale scientific datasets

• (New) Large-scale code coupling and workflow

• Application use-cases which can drive the community to develop MiniApps

• Data reduction methods for scientific data including:

       o Data deduplication methods

       o Motif-specific methods (structured and unstructured meshes, particles, tensors, …)

       o Optimal design of data reduction methods

       o Methods with accuracy guarantees

• Metrics to measure reduction quality and provide feedback

• Data analysis and visualization techniques that take advantage of the reduced data

• Hardware and data co-design

• Accuracy and performance trade-offs on current and emerging hardware

• New programming models for managing reduced data

• Runtime systems for data reduction

 

Workshop Program

9:00am - 9:10am, Opening Remarks, Scott Klasky, Qing Liu, Ian Foster, Mark Ainsworth

9:10am - 9:40am, Keynote Talk, Data Analysis through Advanced Scientific Computing Research at the Department of Energy, William Spotz

9:40am - 10:00am, Understanding Performance-Quality Trade-offs in Scientific Visualization Workflows with Lossy Compression, Jieyang Chen, David Pugmire, Matthew Wolf, Nicholas Thompson, Jeremy Logan, Kshitij Mehta, Lipeng Wan, Jong Youl Choi, Ben Whitney, Scott Klasky

10:00am - 10:30am, Morning Break

10:30am - 11:00am Invited Talk: Data Challenges for Nuclear Femtography, Amber Boehnlein

11:00am - 11:20am, A Collaborative Effort to Improve Lossy Compression Methods for Climate Data, Dorit M. Hammerling, Allison H. Baker, Alexander Pinard, Peter Lindstrom

11:20am - 11:50am, Invited Talk: Changing Science through Online Analysis, Kerstin Kleese Van Dam

11:50am - 12:10pm, PAVE: An In Situ Framework for Scientific Visualization and Machine Learning Coupling, Samuel Leventhal, Mark Kim, David Pugmire

12:10pm - 12:30pm, A Co-Design Study Of Fusion Whole Device Modeling Using Code Coupling, Jong Youl Choi, Jeremy Logan, Kshitij Mehta, Eric Suchyta, William Godoy, Nick Thompson, Lipeng Wan, Jieyang Chen, Norbert Podhorszki, Matthew Wolf, Scott Klasky, Julien Dominski, Choong-Seock Chang

12:30pm - 2:00pm, Lunch Break

2:00pm - 2:30pm, Keynote Talk: Data before Data in Fusion Science, Choong-Seock Chang

2:30pm - 3:00pm, Invited Talk: Scientific Data at Exascale: Architecting Systems for Performance, Productivity, and Parallelism, Rangan Sukumar

3:00pm - 3:30pm, Afternoon Break

3:30pm - 4:00pm, Invited Talk, High Performance Computing at BP, Keith Gray

4:00pm - 4:30pm, Invited Talk: In Situ Data Analytics for Next Generation Molecular Dynamics Workflows, Michela Taufer

4:30pm - 4:50pm, Analyzing the Performance and Accuracy of Lossy Checkpointing on Sub-Iteration of NWChem, Tasmia Reza, Jon Calhoun, Kristopher Keipert, Sheng Di, Franck Cappello, Xin Liang

4:50pm - 5:10pm, Using Machine Learning to Reduce Ensembles of Geological Models for Oil and Gas Exploration, Anna Roubickova, Nick Brown, Oliver Brown

5:10pm - 5:30pm, Exploring Lossy Compression of Gene Expression Matrices, Coleman B. McKnight, Alexandra L. Poulos, M. Reed Bender, Jon C. Calhoun, F. Alex Feltus

 

Organizing Committee

Scott Klasky, Oak Ridge National Laboratory

Gary Liu, New Jersey Institute of Technology

Mark Ainsworth, Brown University/Oak Ridge National Laboratory

Ian Foster, Argonne National Laboratory/University of Chicago

Web Chair

Dan Huang, New Jersey Institute of Technology

 

Technical Program Committee

Frank Cappello, Argonne National Laboratory

Peter Lindstrom, Lawrence Livermore National Laboratory

Todd Munson, Argonne National Laboratory

Kerstin Van Dam, Brookhaven National Laboratory

George Ostrouchov, Oak Ridge National Laboratory

Scott Klasky, Oak Ridge National Laboratory

Mark Ainsworth, Brown University/Oak Ridge National Laboratory

John Wu, Lawrence Berkeley National Laboratory

Todd Munson, Argonne National Laboratory

Eric Suchyta, Oak Ridge National Laboratory

Martin Burtscher, Texas State University

Haihang You, Institute of Computing Technology, Chinese Academy of Sciences

Xubin He, Temple University

 

Call for Papers

The 5th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-5)

In cooperation with IEEE Computer Society Technical Consortium on High Performance Computing

Held in conjunction with SC19: The International Conference for High Performance Computing, Networking, Storage and Analysis

Nov 17th, 2019

Denver, CO

A growing disparity between simulation speeds and I/O rates makes it increasingly infeasible for high-performance applications to save all results for offline analysis. By 2024, computers are expected to compute at 1018 ops/sec but write to disk only at 1012 bytes/sec: a compute-to-output ratio 200 times worse than on the first petascale systems. In this new world, applications must increasingly perform online data analysis and reduction—tasks that introduce algorithmic, implementation, and programming model challenges that are unfamiliar to many scientists and that have major implications for the design of various elements of exascale systems.

This trend has spurred interest in high-performance online data analysis and reduction methods, motivated by a desire to conserve I/O bandwidth, storage, and/or power; increase accuracy of data analysis results; and/or make optimal use of parallel platforms, among other factors.  This requires our community to understand a clear yet complex relationships between application design, data analysis and reduction methods, programming models, system software, hardware, and other elements of a next-generation High Performance Computer, particularly given constraints such as applicability, fidelity, performance portability, and power efficiency.

Topics of interest include but are not limited to:

• (New) AI and Data analysis over extreme-scale scientific datasets

• (New) Large-scale code coupling and workflow

• Application use-cases which can drive the community to develop MiniApps

• Data reduction methods for scientific data including:

         o Data deduplication methods

         o Motif-specific methods (structured and unstructured meshes, particles, tensors, …)

         o Optimal design of data reduction methods

         o Methods with accuracy guarantees

• Metrics to measure reduction quality and provide feedback

• Data analysis and visualization techniques that take advantage of the reduced data

• Hardware and data co-design

• Accuracy and performance trade-offs on current and emerging hardware

• New programming models for managing reduced data

• Runtime systems for data reduction

 

Important Dates

Paper Deadline: September 20th, 2019 (AoE)

Author Notification: by September 30th, 2019

Final submission: by Oct 13th (AOE), 2019

 

Submissions

Papers should be submitted electronically on SC Submission Website.

• Paper submission must be in IEEE format.

http://www.ieee.org/conferences_events/conferences/publishing/templates.html

(New) Submitted papers will be peer-reviewed and accepted papers will be published by IEEE TCHPC.

• Paper submissions are required to be 6 pages excluding references.

Submitted papers will be evaluated by at least 3 reviewers based upon technical merits.

• Authors are required to submit the final copy of the paper along with IEEE copyright form.