Distributed monitoring of concurrent and asynchronous systems

Eric Fabre, Albert Benveniste, Claude Jard, and Stefan Haar

Developing applications over a distributed and asynchronous architecture without the need for synchronization services is going to become a central track for distributed computing. This research track will be central for the domain of autonomic computing and self-management. Distributed constraint solving, distributed observation, and distributed optimization, are instances of such applications. This paper is about distributed observation:  we investigate the problem of distributed monitoring of concurrent and asynchronous systems, with application to distributed fault management in telecommunications networks. Our approach combines two techniques: compositional unfoldings to handle concurrency properly, and a variant of graphical algorithms and belief propagation, originating from statistics and information theory.

This work is partially supported by RNRT (National Research Network in Telecommunication) through the MAGDA  project (Modelling and Learning for a Distributed Management of Alarms).

 Informal introduction on an example (from CDC'2003) (pdf)
     plenary address at CONCUR'2003 (pdf)
extended DISC journal paper version (pdf) -- recommended
  Extended version of the former, IRISA Report (pdf)