Partial Order Techniques for Distributed Discrete Event Systems:

why you can't avoid using them

Eric Fabre and Albert Benveniste

Monitoring or diagnosis of large scale distributed Discrete Event Systems with asynchronous communication is a demanding task. Ensuring that the methods developed for Discrete Event Systems properly scale up to such systems is a challenge. In this paper we explain why the use of partial orders cannot be avoided in order to achieve this objective. To support this claim, we try to push classical techniques (parallel
composition of automata and languages) to their limits. We focus on on-line techniques, where a key difficulty is the choice of proper data structures to represent the set of all runs of a distributed system. We discuss the use of previously known structures such as execution trees and unfoldings. We propose an alternative and more compact data structure called trellis. We study the apparatus needed to extend the use of these data structures to represent distributed executions. And we show how such data structures can be used in performing distributed monitoring and diagnosis.

The techniques reported here were used in an industrial context for fault management and alarm correlation in telecommunications networks.

This report is an extended version of the plenary address that was given by the second author at WODES'2006.

This work is partially supported by RNRT (National Research Network in Telecommunication) through the SWAN project (Self aWare mANagement).

pdf