Friday, March 25, 2016

Default replication factor in HDFS

The default replication factor is set to 3 in HDFS and I can explain it as follow-

As you know Hadoop is used in clustered environment where you have clusters, each cluster will have multiple racks, each rack will have multiple datanodes.

So to make HDFS fault tolerant in your cluster you need to consider following failures-
  1. DataNode failure
  2. Rack failure

Chances of Cluster failure is fairly low so let not think about it. In the above cases you need to make sure that - 

  1. If one DataNode fails, you can get the same data from another DataNode
  2. If the entire Rack fails, you can get the same data from another Rack

So now its pretty clear why default replication factor is set to 3, so that not 2 replica goes to same DataNode and at-least 1 replica goes to different Rack to fulfill the above mentioned Fault-Tolerant criteria. Hope that answers your question.

No comments:

Post a Comment