Since the production of the first X-rays in 2009, LCLS has supported a wide range of novel experiments that have increased our understanding of various physical and chemical phenomena. While the results of these experiments are continuously published in peer-reviewed journals, providing access to the raw data that led to these interpretations would also be beneficial to the scientific community. This will be helpful for researchers interested in developing new methods for data analysis and allow the research to be more accessible, especially to scientists who are unable to access LCLS. In the workshop, we will discuss publishing XFEL data using structural biology as a case study. Due to its long history in publishing experimental data, we are using structural biology as an example but the ideas and concepts to be discussed in the workshop should be applicable to all areas of research at LCLS. Various aspects of the publishing pipeline will be discussed including: (a) Reducing the barrier to publishing raw data for LCLS users (b) Processing of raw data made available to the public (c) Good data and metadata management by adhering to the Findability, Accessibility, Interoperability, and Reuse (FAIR) principles (d) Maintenance of storage infrastructure for repositories like CXIDB (www.cxidb.org) (e) Strategies to facilitate colocation of data and compute resources. We will solicit input from the user community to clearly define the requirements of a data portal and identify which features would render it most capable of accelerating methods development, providing useful benchmarks for analysis, and structuring data collected across light sources in a homogenous fashion. Our aim is to outline solutions that could be proposed to the DOE for a data portal that facilitates the development of machine learning, multimodal, and other novel analyses for data collected at the LCLS and similar facilities.
Location: SLAC Building 53, Trinity, 1350-A
Organizers:
Asmit Bhowmick
Aaron Brewster
Jan Kern
Frederic Poitevin
Ariana Peck
Jana Thayer
Time PST | Title of Talk | Speaker (First, Last Name) | Institution |
1:00 PM | Introduction (10 mins) | Asmit Bhowmick Jan Kern | Lawrence Berkeley National Laboratory |
1:10 PM | LCLS data infrastructure and recent updates (20-25 mins) | Frederic P Poitevin Jana Thayer | SLAC National Accelerator Laboratory |
1:30 PM | Data Sharing, NeXUS file format and processing raw data on NERSC (20 mins) | Aaron Brewster | Lawrence Berkeley National Laboratory |
2:00 PM | FAIR data practices | Herbert J. Bernstein | Ronin Institute for Independent Scholarship |
2:30 PM | CXIDB | Filipe Maia | CXIDB; Uppsala University |
3:00 PM | PDB and data sharing | Steve Burley | RCSB PDB |
3:30 PM 3:40 PM | 10 min Break Deleting data, and other forms of lossy compression | James Holton | Lawrence Berkeley National Laboratory |
4:10 PM | Storage and other considerations | Jana Thayer | SLAC National Accelerator Laboratory |
4:20 PM | Breakout sessions | ||
5:30 PM | Breakout sessions and conclusion |