Friday 27 November 2015

NetAgg: Using Middleboxes for Application-specific on-path aggregation in data centres

Many applications in the data centers (DC) achieve horizontal scalability by adopting the aggregation pattern. The aggregation pattern can clog the network due to scarce inbound bandwidth or limited inter-rack bandwidth. Is this paper, the authors propose NETAGG, a software middlebox platform that provides an on-path aggregation service.

A middlebox is a network appliance attached to a switch that provides services such as firewalls, web proxies, SSL offloading, and load balancing. To maximise performance, middleboxes are often implemented in hardware, and adopt a vertically integrated architecture focusing on a narrow function, e.g. processing standard packet headers or performing relatively simple payload inspection.

They use NETAGG for application-specific aggregation functions for data reduction an each hop, and consequently reducing the possibility of networking bottleneck and, ultimately, improving the preformance. NETAGG use shim layers at edge servers in order to intercept application traffic and redirect it transparently to the agg boxes. They assume that the agg boxes are attached to the network switches and perform on-path aggregation. In this figure, the traffic comes for top-of-the-rack switches (ToR) to a top switch that is connected to a middlebox. The traffic is relayed from the switch to the middleboxes before continuing its course.

They have tested NETAGG with Apache Hadoop and noticed that the intermediate data size was reduced considerately by a factor of 5X on 128GB of intermediate data. In terms of time, NETAGG reduces the shuffle and reduce time in a factor by 4.5X. As a result, this reduces the processing time as well as disk I/O. The only case that NETAGG does not give any advantage is in the case where the job does not reduce the data like in a sorting algorithm.

In conclusion, NETAGG is an on-path aggregation service that transparently intercepts aggregation flows at edge servers using shim layers and redirects them to aggregation nodes (agg boxes). Since NETAGG reduces the data to be transferred between switches, the overall performance of MapReduce jobs can be reduced.

No comments:

Post a Comment