
Accelerating the Performance of Multi-Site Scientific applications through Coordinated Data management

Hermes is an associated team between the Myriads team from INRIA Rennes-Bretagne Atlantique and Lawrence Berkeley National Lab (LBNL).

Principal investigator (Inria): Shadi Ibrahim, Myriads team.
Principal investigators (partner): Suren Byna, Lawrence Berkeley National Lab, USA



Distributed Burst buffer management.

High-performance computing systemshave been designed with a faster storage systems,called Burst Buffers, to handle bursty I/O betweenfast memory layer on compute nodes and slow disk-based storage. The performance of scientific appli-cations under Burst Buffer is greatly impacted bythe size of Burst Buffer and the latency of writingdata to and from the Burst Buffer. Previous solutionsimplement and optimize Burst Buffer for a singleapplication running within a single supercomputingfacility but don’t consider collaborative applications,which run in multiple-sites and utilize distributed Burst Buffers.

Towards demonstrating the practical impact of distributed Burst Buffers for collaborative applications, we have developed a simple performance model to compare the performance of collaborative applications under three solutions including distributed Burst Buffer, single centralized burst buffer or directly using parallel file system. 2021, we plan in 2021 to develop distributed Burst Buffers for multi-site workflows (an internship subject is defined and advertised for this purpose).

Stream data processing in heterogeneous environments.

With Thomas Lambert and David Guyon, we developed a performance-aware task scheduling which place operators considering the network heterogeneity and node capacities (paper in CIKM2020). We are now finalizing the implementation of our solution on the top of Storm (mature and widely used stream data engine). We plan to validate the aforementioned implementation with use cases from HPC applications running at LBNL (On-going).

