Wednesday, 13 July 2016

Overview of the HDFS

HDFS is a distributed filesystem that works in a cluster, and it is used by the Hadoop framework. The HDFS is composed by one Namenode and several Datanodes.

The role of the Namenode is to control the execution of Datanodes, have a map of the filesystem, and issue filesystem commands like read and write files. The Datanodes store the blocks of data.



The Namenode receives periodically heartbeats from the Datanodes, and this is the way to know which are running. The  blocks of data can be replicated in several clusters. We  can see the replicated blocks in the figure. By default, the replication factor is 3.

The checksum is used to ensure blocks or files are not corrupted while files are being read from HDFS or written to HDFS.

HDFS is a fault tolerant system that uses replication to protect files against hardware failures. The network failures are addressed by using multiple racks with multiple switches.



No comments:

Post a Comment