Callum Hepworth, University of British Columbia
Abstract:
Imaging experiments at LCLS-I and LCLS-II offer ever increasing challenges as they are producing images at rates that will soon increase from 120Hz to several kHz. At this point, there are no readily usable tools that are available to grasp the complexity of these datasets, understand their underlying features and sift through them as they are being collected. Dimensionality reduction is a valuable first step towards providing this ability to the beamline scientists and users. I implemented a lightweight tool in the LCLS analysis framework that performs principal component analysis (PCA) on the high-dimensional streaming data generated. This tool had to meet two requirements: 1) the ability to run at all on the very large datasets generated (memory constraint), 2) the ability to run fast enough so it can keep up with data collection (time constraint). An initial literature review pointed us to variants of the PCA algorithm that can be applied to very large or streaming datasets, namely incremental PCA (iPCA). We identified that the iPCA steps scaling with the image size were rate-limiting and proceeded to streamlining them using Message Passing Interface (MPI) parallelization. Ultimately this tool, dubbed parallelized incremental PCA (PiPCA), falls within memory and time performance constraints while offering comparable statistical accuracy to batch PCA methods. PiPCA now serves as an introductory analysis tool at LCLS, to be used on archived data and eventually as a lightweight tool for live data analysis during beamtimes.
Poster Session Link: https://gather.town/invite?token=0pEoq7VP
If you have any questions for the presenter, please contact them via email: callumahepworth@gmail.com