System-Level Design of Fault-Tolerant Embedded Systemsby Alain Girault Fault-tolerance is the ability of a system to maintain its functionality, even in the presence of faults. With the advent of ubiquitous computing and distributed embedded systems, it is becoming an aspect more and more crucial. We have provided new functionalities to the SynDEx system-level CAD software. SynDEx is ideal for optimising distributed real-time embedded systems and our new functionalities allow us to guarantee a specified fault-tolerance level for the generated embeddable code. Our contribution to research in the fault-tolerant embedded systems consists of several scheduling/distribution heuristics. Their common feature is to take as an input two graphs: a data-flow graph ALG describing the algorithm of the application and a graph ARC describing the target distributed architecture (see figure). ![]() To the left is an example of an algorithm graph: it has nine operations (represented by circles) and 11 data-dependences (represented by green arrows). Among the operations, one is a sensor operation (I), one is an actuator operation (O), while the seven others are computations (A to G). Below to the right is an example of an architecture graph: it has three processors (P1, P2, and P3) and three point-to-point communication links (L1.2, L1.3, and L2.3). Also shown is a table giving the worst-case execution time of each operation onto each processor and the worst-case transmission time of each data-dependence onto each communication link. The architecture being a priori heterogeneous, these need not be identical. Below is an example of such a table for the operations of ALG. The infinity sign expresses the fact that the operation I cannot be executed by the processor P3, for instance, to account for the requirement of certain dedicated hardware. From these three inputs, the heuristic distributes the operations of ALG onto the processors of ARC and schedules them statically, together with the communications induced by these scheduling decisions. The output of the heuristic is therefore a static schedule from which embeddable code can be generated. Our fault hypothesis is that the hardware components are fail silent, meaning that a component is either healthy and works fine, or is faulty and produces no output at all. Recent studies on modern hardware architectures have shown that a fail-silent behaviour can be achieved at a reasonable cost, so our fault hypothesis is reasonable. Our contribution consists of the definition of several new scheduling/distribution heuristics in order to generate static schedules that are also tolerant of a fixed number of hardware components (processors and/or communication links) faults. They are implemented inside SynDEx, as an alternative to its own default heuristics (called DSH: Distribution Scheduling Heuristic):
Links: Please contact: |









