Sunday, 14 June 2015

Several types of synchrony

The concept of partial synchrony in a distributed system is introduced, and it lies between the cases of a synchronous system and an asynchronous system. In a synchronous system, there is a known fixed upper bound Δ on the time required for a message to be sent from one processor to another and a known fixed upper bound Φ on the relative speeds of different processors. In an asynchronous system no fixed upper bounds Δ and Φ exist. In a partial synchronous system, in one version, the fixed bounds Δ and Φ exist, but they are not known a priori. On the other version, the bounds are know but they are only guaranteed to hold starting some time T.

One way to compare the several models of synchrony is with the problem of agreement. So, lets consider a collection of N processors, p1,,pN, which communicate by sending messages to one another. Initially each processor pi has a value vi drawn from some domain V of values, and the correct processors must all decide on the same value; moreover, if the initial values are all the same, say v, then v must be the common decision. In addition, the consensus protocol should operate correctly if some of the processors are faulty, for example, if they crash (fail-stop faults), fail to send or receive messages when they should (omission faults), or send erroneous messages (Byzantine faults).

Given the assumption about synchronism of the message delivery and the processor speed, we can characterize the model of the consensus by its resiliency - the maximum number of faults that can be tolerated. For example, it might be assumed that there is a fixed upper bound Δ on the time for messages to be delivered (communication is synchronous) and a fixed upper bound Φ on the rate at which one processor's clock can drift than another's (processors are synchronous), and that these bounds are known a priori and can be "built into" the protocol.

In the case that there are no bounds for message delivery or processor speed (asynchronous systems), it is impossible to have consensus because, in the case of process failures, we cannot know if a process has failed, or the message delivery is slow.

A well-known way to overcome this impossibility is to make partial synchrony assumptions about the system. The consensus problem is possible in this model if the speed of the processes are bounded and all links are eventually timely. If there is a process that the message is not delivered at a time T+Δ, the process is considered failed. This possibility result holds for crash failures (Scrash) and byzantine failures (Sbyz). Thus, consensus is achieved from n2f+1 and s3f+1 for crash and Byzantine failures, respectively, where n is the number of processes and f is the maximum number of processes that can fail.

Therefore, for the consensus problem, the resilience of the algorithm depends on the system model that we define.

Based on Cyntha Dwork, et al. paper [1], here is a table that show how many processes N are necessary to tolerate f faults in all type of systems.

Failure type Synchronous Asynchronous Partially synchronous communications and synchronous processors Partially synchronous communication and processors Partially synchronous processors and synchronous communication
Fail-stop f 2f+1 2f+1 f
Omission f 2f+1 2f+1 [2f, 2f+1]
Authenticated Byzantine f 3f+1 3f+1 2f+1
Byzantine 3f+1 3f+1 3f+1 3f+1

[1] Cynthia Dwork, et al., Consensus in the Presence of Partial Synchrony, 1988

No comments:

Post a Comment