My current research interests are in Big data management, Data-Intensive computing, HPC, cloud and Edge computing, virtualization technology, and file and storage systems.
ANR KerStream (2017-2021). Sole Principal Investigator
I am leading the ANR JCJC KerStream project. This project aims to address the limitations of Hadoop when running stream Big Data applications on large-scale clouds and to do a step beyond Hadoop by proposing a new approach, called KERSTREAM, for scalable and resilient stream Big Data processing on clouds. The KerStream project can be seen as the first step towards developing the first French middleware that handles STREAM DATA processing at SCALE.
Budget: 238,000 EUR
Connect Talent Apollo (2017-2020). Sole Principal Investigator
I am leading the Apollo connect talent project. The Apollo project (Fast, efficient and privacy-aware Workflow executions in massively distributed Data-centers) is an individual research project "Connect Talent". It aims to investigate novel scheduling policies and mechanisms for fast, efficient and privacy-aware data-intensive workflow executions in massively distributed data-centers.
Budget: 201,000 EUR
Hermes associated team (2019-2021). Principal Investigator
I am co-leading (with Suren Byna) the Hermes associate team (Principal Investigator). This is an associate team between Inria and Lawrence Berkeley National Laboratory. It aims at accelerating scientific Big Data application performance through coordinated data management that addresses performance limitations of managing data across multiple sites. In particular, we focus on challenges related to the management of data and metadata across sites, distributed burst buffers, and online data analysis across sites.
Budget: 13,000 EUR a year.
IPL DISCOVERY Inria Project Lab (2015 - 2019). Member
DISCOVERY is an Inria Project Lab,started in 2015, that aims at exploring a new way of operating Utility Computing (UC) resources by leveraging any facilities available through the Internet in order to deliver widely distributed platforms that can better match the geographical dispersal of users as well as the unending demand. I was mainly working on data management in Geo-distributed environments.
I was co-supervising Jad Darrous (PhD student) with Christian Perez and working on investigating new VM deployment mechanisms that exploit locality and improve several aspects including reliability, performance, energy usage in a distributed infrastructure.
The main results of this work have been published in Mascots 2019, ICCCN 2019, and CCGrid 2018.
JLESC: Joint Laboratory on Extreme-Scale Computing (2016-2018). Member
This laboratory is jointly run by Inria, University of Illinois at Urbana Champaign (UIUC), Argonne National Laboratory (ANL) and Barcelona Supercomputing Center (BSC). My contribution: I am working on several sub-projects with the focus on Optimizing and developing solutions to mitigate the I/O interference impacts on HPC application application performance in Next generation Exascale machines and on Resource Management and Scheduling for Data-Intensive HPC Workflows.
BigStorage is an European Training Network (ETN) whose main goal is to train future data scientists in order to enable them and us to apply holistic and interdisciplinary approaches for taking advantage of a data-overwhelmed world, which requires HPC and Cloud infrastructures with a redefinition of storage architectures underpinning them - focusing on meeting highly ambitious performance and energy usage objectives.
I was involved in the project until 2017 where I was co-supervising Mohammed-Yacine Taleb (PhD student) with Gabriel Antoniu and Toni Cortes and working on the energy impact of the most common Cloud and HPC usage scenarios while emphasizing on data management.
I was also leading the recruitment group of the project.
Hemera Inria Large Wingspan-Project (2011-2014). Member
́Hemera aims at demonstrating ambitious upscaling techniques for large scale distributed computing by carrying out several dimensioning experiments on the Grid’5000 infrastructure, at animating the scientific community around Grid’5000 and at enlarging the Grid’5000 community by helping newcomers to make use of Grid’5000. My contribution: Developing and optimizing data-intensive scientific applications on top of Grid'5000; I have illustrated the potential benefits of using Dynamic Voltage Frequency Scaling (DVFS) and I have introduced more energy-efficient approaches based on MapReduce
The main results of this work is published in ARMS-CC 2014 and FGCS 2015.