Package sim2net.failure

This package provides a collection of process failure models.

A process failure occurs whenever the process does not behave according to its algorithm, and here the term process means the application running on one of the nodes in the simulated network. To simulate such behaviors, process failure models are used, and they differ in the nature and scope of faults. Possible process failures may include ([CGR11]): crashes (where a process at some time may simply stop to execute any steps and never recovers); omissions (where a process does not send or receive messages that it is supposed to send or receive according to its algorithm); crashes with recoveries (where a process crashes and never recovers or it keeps infinitely often crashing and recovering); eavesdropping (where a process leaks information obtained in its algorithm to an outside entity); and arbitrary (where a process may deviate in any conceivable way from its algorithm).

[CGR11](1, 2) Christian Cachin, Rachid Guerraoui, Luís Rodrigues. Introduction to Reliable and Secure Distributed Programming, 2ed Edition. Springer-Verlag, 2011.

Module sim2net.failure._failure

Contains an abstract class that should be implemented by all process failure model classes.

class sim2net.failure._failure.Failure(name)

Bases: object

This class is an abstract class that should be implemented by all process failure model classes.

Parameters:
  • name (str): a name of the implemented process failure model.
logger

(Property) A logger object of the logging.Logger class with an appropriate channel name.

node_failure(failures)

Gives in place information about nodes which processes have failed according to the implemented process failure model.

Parameters:
  • failures (list): a list of boolean values of the size equal to the total number of nodes in the simulated network; True value in position \(i\) indicates that the process on node number \(i\) has failed.
random_generator

(Property) An object representing the sim2net.utility.randomness._Randomness pseudo-random number generator.

Module sim2net.failure.crash

This module provides an implementation of the crash model.

In the crash model ([CGR11]), processes at some time may simply stop to execute any steps, and if this is the case, the faulty processes never recover. In this implementation, a failure for each process is determined randomly with the use of the given crash probability that indicates the probability that a process will crash during the total simulation time. By the method used, times at which processes crash will be distributed uniformly in the total simulation time. There is also a possibility to setup a transient period (at the beginning of the simulation), during which process failures do not occur, and the total number of faulty processes can also be limited to a given value.

class sim2net.failure.crash.Crash(time, nodes_number, crash_probability, maximum_crash_number, total_simulation_steps, transient_steps=0)

Bases: sim2net.failure._failure.Failure

This class implements the process crash model.

Note

It is presumed that the node_failure() method is called at each step of the simulation.

Parameters:
  • time: a simulation time object of the sim2net._time.Time class;
  • nodes_number (int): the total number of nodes in the simulated network;
  • crash_probability (float): the probability that a single process will crash during the total simulation time;
  • maximum_crash_number (int): the maximum number of faulty processes;
  • total_simulation_steps (int): the total number of simulation steps;
  • transient_steps (int): a number of steps at the beginning of the simulation during which no crashes occur (default: 0).
Raises:
  • ValueError: raised when the given value of the time object is None; or when the given number of nodes is less than or equal to zero; or when the given crash probability is less than zero or grater than one; or when the given value of the maximum number of faulty processes or the given value of the total simulation steps is less than zero; or when the number of steps in the transient period is less than zero or greater than the given value of the total simulation steps.
_Crash__crashes(nodes_number, crash_probability, maximum_crash_number, total_simulation_steps, transient_steps)

Determines faulty processes and their times of crash with the use of the given crash probability. There is also a possibility to setup a transient period (at the beginning of the simulation), during which process failures do not occur, and the total number of faulty processes can also be limited to a given value.

Parameters:
  • nodes_number (int): the total number of nodes in the simulated network;
  • crash_probability (float): the probability that a single process will crash during the total simulation time;
  • maximum_crash_number (int): the maximum number of faulty processes;
  • total_simulation_steps (int): the total number of simulation steps;
  • transient_steps (int): a number of steps at the beginning of the simulation during which no crashes occur (default: 0).
Returns:
A list of tuples; each tuple contains an identifier of the node with faulty process and its time of crash (in simulation steps). The list is sorted in ascending order by crash times.
node_failure(failures)

Gives in place information about nodes which processes have failed according to the crash model.

Parameters:
  • failures (list): a list of boolean values of the size equal to the total number of nodes in the simulated network; True value in position \(i\) indicates that the process on node number \(i\) has failed.
Returns
A list of nodes which processes failed at the current simulation step.

Examples:

In order to avoid any process failures use this class with the crash_probability and/or maximum_crash_number parameters set to 0, as in the examples below.

>>> clock = Time()
>>> clock.setup()
>>> crash = Crash(clock, 4, 0.0, 0, 2)
>>> failures = [False, False, False, False]
>>> clock.tick()
(0, 0.0)
>>> crash.node_failure(failures)
[]
>>> print failures
[False, False, False, False]
>>> clock.tick()
(1, 1.0)
>>> crash.node_failure(failures)
[]
>>> print failures
[False, False, False, False]

>>> clock = Time()
>>> clock.setup()
>>> crash = Crash(clock, 4, 1.0, 0, 2)
>>> failures = [False, False, False, False]
>>> clock.tick()
(0, 0.0)
>>> crash.node_failure(failures)
[]
>>> print failures
[False, False, False, False]
>>> clock.tick()
(1, 1.0)
>>> crash.node_failure(failures)
[]
>>> print failures
[False, False, False, False]