
This session builds upon an effort started at CUG 2019 and continued at SC19 in which several HPC centers gathered to discuss acceptance and regression testing procedures and frameworks. From that session, we learned there are many commonalities in the procedures and tools utilized for system testing. CSCS, KAUST, and NERSC use the ReFrame framework for regression testing. While other centers, like NCSA and OLCF, have built in-house tools for acceptance testing. From the experiences shared, we see there are many benchmarks and applications that are widely run which often become part of a local test suite. These common elements are a strong indication that a tighter collaboration between centers would be beneficial. Furthermore, as systems become more complex, leveraging the HPC community to develop and maintain the growing number of tests needed to assess a system is key. The BOF will include lightning presentations from HPC centers that are using different testing frameworks for regression and acceptance. These will be followed by discussions around these topics: What challenges are centers currently facing in this area? What role should the vendors play in testing? How can we leverage testing efforts across centers to develop a maintainable collection of tests? This BOF invites attendees participation to form the HPC System Test working group. The group will collaborate to define a set of guidelines and methodologies that can be used to build and maintain a collection of HPC system tests. All products from the session will be publicly available at: https://olcf.github.io/hpc-system-test-wg/
