Wednesday 13 May 2015

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection

In this article presented at NSDI'15, Suresh et al. present an adaptive replica selection mechanism, C3, that works with Cassandra and it is robust to performance variability in the environment.

Systems that respond to user actions very quickly (within 100 milliseconds) feel more fluid and natural to users than those that take longer. Improvements in Internet connectivity and the rise of warehouse-scale computing systems have enabled Web services that provide fluid responsiveness while consulting multi-terabyte datasets that span thousands of servers.

Large online services that need to create a predictably responsive whole out of less predictable parts are called latency tail-tolerant, or tail-tolerant for brevity.

It is challenging to deliver consistent low latency application on the internet. There are some web applications that use databases to retrieve data, and still require low and predictable latencies. Other applications like Hadoop, it is used for distributed computing, and it needs to query big data. If you have a web application that processes client requests using Hadoop, you don't want to keep the client waiting forever for the computation result because you have created a bottleneck by submitting all jobs to the same runtime. You need to find a way to deliver a fast response.

A recurring pattern to reducing tail latency is to exploit the redundancy built into each tier of the application architecture, wherein a client node has to make a choice about selecting one out of multiple replica servers to serve a request.

The replica selection strategy has a direct effect on the tail of the latency distribution. This is particularly so in the context of data stores that rely on replication and partitioning for scalability, such as key-value stores. Replica selection can compensate for these conditions by preferring faster replica servers whenever possible.

C3, an adaptive replica selection mechanism that is robust in the face of fluctuations in system performance. C3 uses a combination of in-band feedback from servers to rank and prefer faster replicas along with distributed rate control and backpressure in order to reduce tail latencies in the presence of service-time fluctuations.

Through comprehensive performance evaluations, they have demonstrated that C3 improves Cassandra’s mean, median and tail latencies.



No comments:

Post a Comment