Monday, 12 January 2015

Perspectives on the CAP theorem

Paper here

This paper is a reflection about the CAP theorem and it places the theorem within a broader context of distributed computing theory.

CAP theorem appeared in the context of web services. A web service is implemented by a set of servers, perhaps distributed over a set of geographically distant data centers. The CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

  1. Consistency (all nodes see the same data at the same time)
  2. Availability (a guarantee that every request receives a response about whether it succeeded or failed)
  3. Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)

There is a trade-off that has been much discussed ever since: the impossibility of guaranteeing both safety, and liveness in an unreliable distributed system.

Consistency (as defined in the CAP Theorem) is a classic safety property: every response sent to a client is correct. I remember that an algorithm is safe or consistent if nothing bad ever happens. In case of web services, consistency means to return the right response to the client request. Thus, consistency depends on the service provided.

Availability is a classic liveness property: eventually, every request receives a response. An algorithm is live if eventually something good happen. Availability simply means that each request eventually receive a response. Obviously, a fast response is better than a slow response, but this property can be acceptable, or not, according to the system requirements. In a real-time system, a response that is late can be sufficient to create problems. In a purchase in the Amazon site, this is acceptable.

Partitions, crash failures, message loss, malicious attacks can turn a system unreliable. The CAP theorem only cares about network partitions. Network partitions can happen because some servers become unreachable, or some router partitioned the network. This is a common failure that an effect on distributed systems.

The paper states that it is impossible to achieve both consistency and availability in an unreliable distributed system, and it is necessary to sacrifice one of these two properties, or even both properties.

  1. There are systems that guarantee strong consistency and provide best effort availability.
  2. There are systems that guarantee availability, and provide best effort consistency.
  3. Some systems may sacrifice both consistency and availability.

This sacrifice is acceptable if the characteristics of the service that is designed accepts it. If not, the service is not feasible.

No comments:

Post a Comment