This article simulate the scenario of namenode directory corruption.
Disaster:
1. Shutdown secondary namenode/etc/init.d/hadoop-hdfs-secondarynamenode stop2. Force a checkpoint on secondary namenode
hdfs secondarynamenode -checkpoint force3. Shutdown namenode
/etc/init.d/hadoop-hdfs-namenode stopCurrently on namenode:
-rw-r--r--. 1 hdfs hadoop 37385 Jun 14 12:29 fsimage_0000000000000011104 -rw-r--r--. 1 hdfs hadoop 62 Jun 14 12:29 fsimage_0000000000000011104.md5 -rw-r--r--. 1 hdfs hadoop 441 Jun 14 14:23 edits_0000000000000011105-0000000000000011112 -rw-r--r--. 1 hdfs hadoop 30 Jun 14 14:24 edits_0000000000000011113-0000000000000011114 -rw-r--r--. 1 hdfs hadoop 30 Jun 14 14:24 edits_0000000000000011115-0000000000000011116 -rw-r--r--. 1 hdfs hadoop 30 Jun 14 14:37 edits_0000000000000011117-0000000000000011118 -rw-r--r--. 1 hdfs hadoop 1048576 Jun 14 14:37 edits_inprogress_0000000000000011119 -rw-r--r--. 1 hdfs hadoop 6 Jun 14 14:37 seen_txidCurrently on secondary namenode:
-rw-r--r--. 1 root root 37466 Jun 14 14:37 fsimage_0000000000000011118 -rw-r--r--. 1 root root 62 Jun 14 14:37 fsimage_0000000000000011118.md5 drwxr-xr-x. 2 hdfs hadoop 12288 Jun 14 14:37 . -rw-r--r--. 1 hdfs hadoop 208 Jun 14 14:37 VERSION4. On namenode, move dfs.namenode.name.dir to a different location, and create an empty directory.
[root@hdm name]# pwd /data/nn/dfs/name [root@hdm name]# mv current /tmp/backup_nn_current [root@hdm name]# mkdir current [root@hdm name]# chown hdfs:hadoop current5. Then namenode will fail to start:
FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: NameNode is not formatted. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:210) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:627) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:469) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:609) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:594) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1235) INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
Recovery:
1. Create an empty directory specified in the dfs.namenode.checkpoint.dir configuration variable.mkdir -p /data/secondary_nn/dfs/namesecondary chown hdfs:hadoop /data/secondary_nn/dfs/namesecondary2. Scp fsimage and edit logs from secondary namenode to namenode's dfs.namenode.checkpoint.dir.
[root@hdw3 namesecondary]# pwd /data/secondary_nn/dfs/namesecondary [root@hdw3 namesecondary]# scp -r current hdm:/data/secondary_nn/dfs/namesecondary/3. Change owner and group on namenode
chown -R hdfs:hadoop /data/secondary_nn/dfs/namesecondary/*4. Namenode import checkpint
hdfs namenode -importCheckpoint5. Restart HDFS cluster
No comments:
Post a Comment