SHADI IBRAHIM

Tenured Inria Research Scientist

Inria Rennes – Bretagne Atlantique Research Center

Contact details
Inria Rennes Bretagne – Atlantique
Campus Universitaire de Beaulieu
35042, Rennes

Rennes
Phone : +33 (0) 2 99 84 25 34
Fax : +33 (0) 2 99 84 71 71
Email: shadi DOT ibrahim AT inria DOT fr

Inria

 

Job Opening

Two Post-doc positions funded by the ANR KerStream project and one Post-doc funded by the Apollo Connect Talent project are available. Please feel free to email me your resume.


Post-Doc Position [Towards scalable and reliable Big Data stream computations on Clouds]

Advisors: Shadi Ibrahim (STACK team)

Main contacts: shadi.ibrahim (at) inria.fr

Application deadline: as early as possible

Location: Inria, Nantes

Stream data processing applications are emerging as first-class citizens in many enterprises and academia. For example, click-through rates of links in social networks, abuse prevention, etc. Hadoop MapReduce can not deal with stream data applications as it requires the data to be initially stored in a distributed file system in order to process them. Several systems have been introduced for stream data processing such as Flink [1], Spark Streaming [2], Storm [3], and Google MillWheel [4], etc. These systems keep computation in-memory for low-latency and preserve scalability through using data-partitioning or dividing the streams into a set of deterministic batch computations. However, they are designed to work in dedicated environments and they do not consider the performance variability (i.e., network, I/O, etc.) caused by resource contention in the cloud. This variability may in turn cause high and unpredictable latency when output streams are transmitted to further analysis. Moreover, they overlook the dynamic nature of data streams and the volatility in their computation requirements. Finally, they still address failures in a best-effort manner.

The goal of this work is to propose new approaches for reliable, stream Big Data processing on clouds by (1) exploring new mechanisms that expose resource heterogeneity (observed variability in resource utilization at runtime) when scheduling stream data applications; (2) investigating how to automatically adapt to node failures and adapt the failure handling techniques to the characteristics of the running application and to the root cause of failures.

[1] “Apache flink,” https://flink.apache.org.

[2] M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica, “Discretized streams: Fault-tolerant streaming computation at scale,” in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ser. SOSP ’13, 2013, pp. 423–438.

[3] “Storm,” http://storm-project.net/.

[4] T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle, “Millwheel: fault-tolerant stream processing at internet scale,” Proceedings of the VLDB Endowment, vol. 6, no. 11, pp. 1033–1044, 2013.

 

 


Post-Doc Position [Fast and efficient data-intensive workflow executions in distributed Data-centers]

Advisors: Shadi Ibrahim (STACK team)

Main contacts: shadi.ibrahim (at) inria.fr

Application deadline: as early as possible

Location: Inria, Nantes

Many large companies are now deploying their services globally to guarantee low latency to users around the world and to ensure high availability and low cost. For example, social networks like Facebook and twitter store their data on geo-distributed DCs to provide services worldwide with low latency. However, to run data-intensive workflows on top of those geo-distributed environments (e.g., video stream processing, geo-distributed scientific data analytics, etc), several challenges arise to existing data-intensive distributed workflow frameworks (e.g., MapReduce, Hadoop, Spark, Tenserflow), due to the low capacity of WAN links, the heterogeneity of resources, the multi-levels of network heterogeneities in geo-distributed DCs. The goal of this work is to investigate novel scheduling policies and mechanisms for fast data-intensive workflow executions in massively distributed environments. In particular, how to improve the overall performance by considering the nature and size workflows inputs (e.g., streams, fixed datasets, etc) and intermediate data, number of iterations, and the heterogeneity of resources when allocating tasks/jobs inside and in-between DCs.


Post-Doc Position [Resource management and scheduling for Stream Data Applications on Clouds] Filled

Advisors: Shadi Ibrahim (STACK team)

Main contacts: shadi.ibrahim (at) inria.fr

Application deadline: as early as possible

Location: Inria, Nantes

Stream data processing applications are emerging as first-class citizens in many enterprises and academia. However, It is now commonplace for an organization to use the same infrastructure for a variety of data intensive applications. While few work focused on scheduling multiple data-intensive applications with mixed requirements (e.g., deadlines), no work has focused on different types of applications (i.e., stream data applications). Moreover, when sharing cloud resources, fairness, consolidation and performance come into question. For instance, how can we keep preserving high system utilization and avoiding QoS violation of the diverse applications. To address these challenges, an adaptive job scheduling framework will be developed. The framework will be accommodated with several scheduling polices that can be adaptively tuned in response to the application’s behavior and requirement.

 


© Shadi Ibrahim 2018