Can a Gedae application be fault tolerant?
One requirement for many embedded systems is to carry out Built In Test of the underlying hardware resources and, when a hardware fault is identified, to remap the application while on line to exclude the faulty hardware.
In a final stand-alone Gedae application would these issues need to be addressed by other appropriate control software, produced outside the Gedae environment and operating 'above' the Gedae command program interface, or is it possible to address both of these issues through appropriate flow graph design? That is:
- Is it possible to perform Built In Test using portions of a flow-graph to manipulate status data, derived from the underlying operating system or hardware?
- Is it possible to enable reconfigurability using appropriately designed flow-graphs, with redundant processing threads which can be switched in/out using valves?
Discussion
Gedae supports implementing fault tolerance and isolation using either command programs or Gedae flow graphs. The solution that we will use in both cases has a health check task running on each processor and that information is being reported to a central processor that is controlling the application. The application decides after some period of not hearing from an embedded processor that it is dead and reassigns its duties to a spare processor. The solutions described are the two extremes. The first does everything in a command program. The second does everything on the embedded system. Of course there are intermediate solutions which are a mix of the two solutions presented.
Using the Gedae Command Program Interface
Figure 1 is an illustration of a system that uses a command program and multiple launch packages and health check tasks that were developed outside of Gedae . The command program is monitoring the health of each embedded processor by querying a health check task that is running on the processor. Alternatively, the health check task could report back on a periodic basis and the command program could monitor those reports. Since the health check and Gedae launch package tasks are separate tasks, it is necessary to have a multitasking OS running on the embedded processor. When the command program determines that a processor has failed then it disconnects that processor, starts a launch package on a spare processor, and reconnects the source to the new launch package. The sequence of operations is:
- Recognize that an embedded processor (Processor 2 in Figure 2) is not replying.
- Disconnect the command program from the health check process.
- Disconnect the data source from the Gedae launch package running on the nonresponsive processor.
- Load launch package 2 on the spare processor.
- Connect the data source to the new launch package.
- Start the new launch package.
The result is illustrated in Figure 2.


Using a Gedae Graph
Figure 3 is a flow graph that shows how one might implement fault detection and isolation within a Gedae flow graph. The same graph is annotated in Figure 4 to show how the graph would be mapped to processors. The function box Sink has logic in it that feeds back information to the controller indicating that a data set has been successfully processed. The function box Control monitors the health check inputs to determine if any processors are not responding. Of course there is the double check to see if the Sink box has reported success. The Control box will hold the data set on the input until the Sink box has reported the completion of processing. It will then discard that data set. If the Control box determines that a processor is not responding then it will reassign any data sets that were assigned to that processor to another processor. Notice that in this scheme all the mode processors are really on equal status. There are simply enough of them to make sure that even with the loss of 1 or 2 or N of them there is still sufficient processing power to maintain the required throughput. This scheme is dependent on the source and sink processors not failing. Other schemes can be developed that provide redundant source and sink processors, and improved fault tolerance.
Currently the valves/merge combinations cannot be distributed so they would be replaced with nondeterministic boxes that are available in the core Gedae library.
Using a Gedae Graph
Generally using a command program will be more cumbersome and slower but will use less program memory. Generally using a Gedae graph will provide for quicker isolation but will use additional memory. The biggest difference in the implementations is that the command program will load and unload smaller launch packages whereas the Gedae graph will contain all the necessary code. Of course, loading and unloading launch packages and making and breaking connections is more time consuming than switching between code that is resident in memory.

Figure 3

Figure 4
return to top