Tuesday 25 November 2014

Lock-free vs. wait-free concurrency

There are two types of non-blocking thread synchronization algorithms - lock-free, and wait-free.

In lock-free systems, while any particular computation may be blocked for some period of time, all CPUs are able to continue performing other computations. Saying in other words, while a given thread might be blocked by other threads in a lock-free system, all CPUs can continue doing other useful work without stalls. Saying again in other words, if some threads block, the whole system keeps running.

A non-blocking algorithm is lock-free if there is guaranteed system-wide progress regardless of scheduling. Lock-free algorithms increase the overall throughput of a system by occasionally increasing the latency of a particular transaction. Most high- end database systems are based on lock-free algorithms, to varying degrees.

By contrast, wait-free algorithms ensure that in addition to all CPUs continuing to do useful work, no computation can ever be blocked by another computation. Confusing, isn't it? I am going to try to explain it simpler.

Lock-freedom

Lock-freedom allows individual threads to starve but guarantees system-wide throughput. An algorithm is lock-free if it satisfies that when the program threads are run sufficiently long at least one of the threads makes progress.

Wait-freedom

An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes, regardless of the execution speeds of other processes. Wait-freedom is the strongest non-blocking guarantee of progress, and ensure a high throughput without sacrificing latency of a particular transaction. They are also much harder to implement, test, and debug. Wait-free algorithms are always lock-free algorithms.

In a situation where a system handles dozens of concurrent transactions and has soft latency requirements, lock-free systems are a good compromise between development complexity and high concurrency requirements. A database server for a website is a good candidate for a lock-free design. While any given transaction might block, there are always more transactions to process in the meantime, so the CPUs will never stay idle. The challenge is to build a transaction scheduler that maintains a good mean latency, and a well bounded standard deviation.

In a scenario where a system has roughly as many concurrent transactions as CPU cores, or has hard real-time requirements, the developers need to spend the extra time to build wait-free systems. In these cases blocking a single transaction is not acceptable - either because there are no other transactions for the CPUs to handle, minimizing the throughput, or a given transaction needs to complete with a well defined non-probabilistic time period. Nuclear reactor control software is a good candidate for wait-free systems.

No comments:

Post a Comment