The concept of partial synchrony in a distributed system is introduced, and it lies between the cases of a synchronous system and an asynchronous system. In a synchronous system, there is a known fixed upper bound on the time required for a message to be sent from one processor to another and a known fixed upper bound on the relative speeds of different processors. In an asynchronous system no fixed upper bounds and exist. In a partial synchronous system, in one version, the fixed bounds and exist, but they are not known a priori. On the other version, the bounds are know but they are only guaranteed to hold starting some time .
One way to compare the several models of synchrony is with the problem of agreement. So, lets consider a collection of processors, , which communicate by sending messages to one another. Initially each processor has a value drawn from some domain of values, and the correct processors must all decide on the same value; moreover, if the initial values are all the same, say , then must be the common decision. In addition, the consensus protocol should operate correctly if some of the processors are faulty, for example, if they crash (fail-stop faults), fail to send or receive messages when they should (omission faults), or send erroneous messages (Byzantine faults).
Given the assumption about synchronism of the message delivery and the processor speed, we can characterize the model of the consensus by its resiliency - the maximum number of faults that can be tolerated. For example, it might be assumed that there is a fixed upper bound on the time for messages to be delivered (communication is synchronous) and a fixed upper bound on the rate at which one processor's clock can drift than another's (processors are synchronous), and that these bounds are known a priori and can be "built into" the protocol.
In the case that there are no bounds for message delivery or processor speed (asynchronous systems), it is impossible to have consensus because, in the case of process failures, we cannot know if a process has failed, or the message delivery is slow.
A well-known way to overcome this impossibility is to make partial synchrony assumptions about the system. The consensus problem is possible in this model if the speed of the processes are bounded and all links are eventually timely. If there is a process that the message is not delivered at a time , the process is considered failed. This possibility result holds for crash failures () and byzantine failures (). Thus, consensus is achieved from and for crash and Byzantine failures, respectively, where n is the number of processes and f is the maximum number of processes that can fail.
Therefore, for the consensus problem, the resilience of the algorithm depends on the system model that we define.
Based on Cyntha Dwork, et al. paper [1], here is a table that show how many processes are necessary to tolerate faults in all type of systems.
Failure type | Synchronous | Asynchronous | Partially synchronous communications and synchronous processors | Partially synchronous communication and processors | Partially synchronous processors and synchronous communication |
---|---|---|---|---|---|
Fail-stop | |||||
Omission | [, ] | ||||
Authenticated Byzantine | |||||
Byzantine |
[1] Cynthia Dwork, et al., Consensus in the Presence of Partial Synchrony, 1988
No comments:
Post a Comment