Friday, June 13, 2014

Data node becomes dead due to one disk failure

Many data nodes are having multiple disks and each disk is for one data volume.
This article explains the action items when data node fails or becomes dead because of only one disk failure.

Env:

Hadoop 2.0

Symptoms:

1. One data node is not shown from "hdfs dfsadmin -report". For example, if you have 10 data nodes configured,  but the output only shows 9 data nodes available, 0 dead node.

eg:
Datanodes available: 9 (9 total, 0 dead)

2. However the datanode service on that problematic node is running.

[root@hdw1]# /etc/init.d/hadoop-hdfs-datanode status
datanode (pid  7938) is running...

3.  From the datanode log, below errors shows up right after restarting datanode service:

Below FATAL error shows:
FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-xxx-xxx.xxx.x.x-xxxxxxx (storage id DS-xxx-192.168.xxx.x-xxxxx-xxxxxxx) service to namenode.OPENKB.INFO/192.168.xxx.2:8020
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 4, volumes configured: 5, volumes failed: 1, volume failures tolerated: 0
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:186)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:857)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:819)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
        at java.lang.Thread.run(Thread.java:744)
Finally it starts to keep printing below error every several seconds:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService for Block pool BP-xxx-xxx.xxx.x.x-xxxxxxx (storage id DS-xxx-192.168.xxx.x-xxxxx-xxxxxxx) service to namenode.OPENKB.INFO/192.168.xxx.2:8020
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:439)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
        at java.lang.Thread.run(Thread.java:744)

4.  "hdfs dfsadmin -report" finally found the missing data node, but it will be marked as "dead".

Datanodes available: 9 (10 total, 1 dead)

Root Cause:

Each data node configures multiple data volumes, and each of them is on one physical disk.
For example, below data node has 3 data volumes configured in hdfs-site.xml.
<property>
    <name>dfs.datanode.data.dir</name>
    <value>/data1/dfs/data,/data2/dfs/data,/data3/dfs/data,</value>
</property>
Disk issue happened before on volume /data3/dfs/data.
By default, the parameter "dfs.datanode.failed.volumes.tolerated" is set to 0, which means:The number of volumes that are allowed to fail before a datanode stops offering service. By default any volume failure will cause a datanode to shutdown.

Fix:

After replacing the disk on problematic data volume:

1. Create the data volume by system admin.

2. Create the data directory specified by dfs.datanode.data.dir.

Eg:
mkdir -p /data3/dfs/data

3. Change the owner and group

chown hdfs:hadoop /data3/dfs/data

4. Start the datanode service.

/etc/init.d/hadoop-hdfs-datanode status

5. Confirm the data node is become live node using below command:

hdfs dfsadmin -report

BTW, if you want to bring the data node up with valid data volumes, and skip that broken volume.
Just change dfs.datanode.failed.volumes.tolerated to the number of failed volumes in hdfs-site.xml.
And then restart the datanode service, and it will work.
eg:
<property>
    <name>dfs.datanode.failed.volumes.tolerated</name>
    <value>1</value>
</property>

3 comments:

  1. Hi, do we have to shutdown machine to replace bad disk or we can add then while it is on. ?

    ReplyDelete
  2. I am facing similar kind of issue but needs to know anyway to use same disk by deleting bock pool folder. I was able to traverse that disk folder using unix commands.

    ReplyDelete

Popular Posts