In this paper it is explained why distributed are hard to design and understand. The author start to show that in the past, similar science problems always could be solved by two approaches:
- Experimental observation
- Modelling analysis
These approaches masquerade a dichotomy between "theory" and "practice", and the best solution is when the 2 go along, and both can learn from each other. When devising a distributed system, it is necessary to create tractable models that are ruled by interaction that will support the analysis. Mathematical and logical formulas (theory), and computer simulations (practice) are a way to create models. These models will help to devise a solution. We must keep in mind that is easy to create accurate models, but it is hard to create tractable models. When building a model we must keep in mind that we must identify 2 things: - feasibility - cost The first one must be able to recognize an unsolvable problem lurking beneath a system's requirements, and the second one must identify the cost implications. Solving these 2 points, we have a yardstick with which we can evaluate any solution that it is devised. In the end, having defined a model, we have refined our intuition since the problem was introduced.
There are many problems that we can face when designing a distributed system. We must take care of coordination problems, or synchronous or asynchronous assumptions, or election protocols, or failure models. The author details the importance of the models in each problem.
- For coordination problems, it is impossible to create a protocol that coordinate the behaviour of 2 processes, and those process must execute the same action, in a faulty scenario. This coordination problem could not be detected by intuition.
- For synchronous or asynchronous assumptions, it is necessary to see if there are temporal limits about the speed of process execution, or message delivery delays. = for asynchronous systems, there are no temporal limits, or the temporal limit is 0 (when a message is processed immediately after being delivered). = for synchronous systems, it is necessary to coordinate messages and process execution with schedulers, take care of performance degradation. In practice, it is necessary to take care when implementing processes and communication channels.
- Solving the election protocol in an asynchronous system is not difficult and expensive. In a synchronous system, it is possible to solve the election problem with only a single broadcast.
- There are several failure models that assign responsibility for faulty behaviour to the system's components -- processors, and communication channels. A process can slow, or crashed, a message delivery can be delayed, and this is difficult to detect. So, in an asynchronous system, it is harder to deal with crash failures than fail-stop failures; in a synchronous system crash and fail-stop models are equivalent.
For practitioners, models can help to define a set of assumptions that they must take care when devising a solution. For theoreticians, the models will allow them to create accurate ones that deal with tractable problems.
- Experimental observation
- Modelling analysis
There are many problems that we can face when designing a distributed system. We must take care of coordination problems, or synchronous or asynchronous assumptions, or election protocols, or failure models. The author details the importance of the models in each problem.
- For coordination problems, it is impossible to create a protocol that coordinate the behaviour of 2 processes, and those process must execute the same action, in a faulty scenario. This coordination problem could not be detected by intuition.
- For synchronous or asynchronous assumptions, it is necessary to see if there are temporal limits about the speed of process execution, or message delivery delays. = for asynchronous systems, there are no temporal limits, or the temporal limit is 0 (when a message is processed immediately after being delivered). = for synchronous systems, it is necessary to coordinate messages and process execution with schedulers, take care of performance degradation. In practice, it is necessary to take care when implementing processes and communication channels.
- Solving the election protocol in an asynchronous system is not difficult and expensive. In a synchronous system, it is possible to solve the election problem with only a single broadcast.
- There are several failure models that assign responsibility for faulty behaviour to the system's components -- processors, and communication channels. A process can slow, or crashed, a message delivery can be delayed, and this is difficult to detect. So, in an asynchronous system, it is harder to deal with crash failures than fail-stop failures; in a synchronous system crash and fail-stop models are equivalent.
 
No comments:
Post a Comment