Hadoop HA

鸡汤:要等到时间对 等大雁向南飞,我暖着我的玫瑰不让她枯萎

1. 概述

官方文档 http://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

  • 1)所谓HA(High Available),即高可用(7*24小时不中断服务)。
  • 2)实现高可用最关键的策略是消除单点故障。HA严格来说应该分成各个组件的HA机制: HDFS的HA和YARN的HA
  • 3)Hadoop2.0之前,在HDFS集群中NameNode存在单点故障(SPOF)。
  • 4)NameNode主要在以下两个方面影响HDFS集群
NameNode机器发生意外,如宕机,集群将无法使用,直到管理员重启
NameNode机器需要升级,包括软件、硬件升级,此时集群也将无法使用
HDFS HA功能通过配置Active/Standby两个NameNodes实现在集群中对NameNode的热备来解决上述问题。
如果出现故障,如机器崩溃或机器需要升级维护,这时可通过此种方式将NameNode很快的切换到另外一台机器。

2. HDFS-HA工作机制

2.1 HDFS-HA工作要点

  • 元数据管理方式需要改变
内存中各自保存一份元数据;
Edits日志只有Active状态的NameNode节点可以做写操作;
两个NameNode都可以读取Edits;
共享的Edits放在一个共享存储中管理(qjournal和NFS两个主流实现)
  • 需要一个状态管理功能模块
实现了一个zkfailover,常驻在每一个namenode所在的节点,每一个zkfailover负责监控自己所在NameNode节点,利用zk进行状态标识,当需要进行状态切换时,由zkfailover来负责切换,切换时需要防止brain split现象的发生。
  • 必须保证两个NameNode之间能够ssh无密码登录
  • 隔离(Fence),即同一时刻仅仅有一个NameNode对外提供服务

2.2 HDFS-HA自动故障转移工作机制

  命令hdfs haadmin -failover手动进行故障转移,在该模式下,即使现役NameNode已经失效,系统也不会自动从现役NameNode转移到待机NameNode,下面学习如何配置部署HA自动进行故障转移。自动故障转移为HDFS部署增加了两个新组件:ZooKeeper和ZKFailoverController(ZKFC)进程,如图3-20所示。ZooKeeper是维护少量协调数据,通知客户端这些数据的改变和监视客户端故障的高可用服务。HA的自动故障转移依赖于ZooKeeper的以下功能:

  • 1)故障检测:集群中的每个NameNode在ZooKeeper中维护了一个持久会话,如果机器崩溃,ZooKeeper中的会话将终止,ZooKeeper通知另一个NameNode需要触发故障转移。
  • 2)现役NameNode选择:ZooKeeper提供了一个简单的机制用于唯一的选择一个节点为active状态。如果目前现役NameNode崩溃,另一个节点可能从ZooKeeper获得特殊的排外锁以表明它应该成为现役NameNode。

  ZKFC是自动故障转移中的另一个新组件,是ZooKeeper的客户端,也监视和管理NameNode的状态。每个运行NameNode的主机也运行了一个ZKFC进程,ZKFC负责:

1)健康监测:ZKFC使用一个健康检查命令定期地ping与之在相同主机的NameNode,
只要该NameNode及时地回复健康状态,ZKFC认为该节点是健康的。如果该节点崩溃,
冻结或进入不健康状态,健康监测器标识该节点为非健康的。

2)ZooKeeper会话管理:当本地NameNode是健康的,ZKFC保持一个在ZooKeeper中打开的会话。
如果本地NameNode处于active状态,ZKFC也保持一个特殊的znode锁,
该锁使用了ZooKeeper对短暂节点的支持,如果会话终止,锁节点将自动删除。

3)基于ZooKeeper的选择:如果本地NameNode是健康的,且ZKFC发现没有其它的节点当前持有znode锁,
它将为自己获取该锁。如果成功,则它已经赢得了选择,
并负责运行故障转移进程以使它的本地NameNode为Active。
故障转移进程与前面描述的手动故障转移相似,首先如果必要保护之前的现役NameNode,
然后本地NameNode转换为Active状态。

hadoop ha

3. HDFS-HA集群配置

IP 主机名 软件软件
192.168.2.20 master NameNode,JournalNode,DataNode,ZK,NodeManager
192.168.2.21 slave1 NameNode,JournalNode,DataNode,ZK,ResourceManager,NodeManager
192.168.2.22 slave2 JournalNode,DataNode,ZK,NodeManager

4. 配置Zookeeper集群

省略,参考之前的zk

5. 配置HDFS-HA集群

新部署环境

root@master:~# mkdir /app
root@slave1:~# mkdir /app
root@slave2:~# mkdir /app
root@master:~# tar xf /usr/local/src/hadoop-2.6.5.tar.gz -C /app/
root@master:~# mkdir /app/hadoop-2.6.5/dfs/{name,data}
root@master:~# mkdir /app/hadoop-2.6.5/data
root@master:~# mkdir /app/hadoop-2.6.5/tmp
root@master:~# cd /app/hadoop-2.6.5/etc/hadoop/
root@master:/app/hadoop-2.6.5/etc/hadoop# ls *.sh
hadoop-env.sh  httpfs-env.sh  kms-env.sh  mapred-env.sh  yarn-env.sh

配置JAVA_HOME

root@master:/app/hadoop-2.6.5/etc/hadoop# which java
/usr/lib/jvm/jdk1.8.0_112/bin/java
root@master:/app/hadoop-2.6.5/etc/hadoop# find ./ -name "*.sh" | xargs grep -irn jdk1.8.0_112
./yarn-env.sh:23:export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_112
./yarn-env.sh:26:  JAVA_HOME=/usr/lib/jvm/jdk1.8.0_112
./hadoop-env.sh:25:export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_112
./mapred-env.sh:16:export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_112

修改配置文件

root@master:/app/hadoop-2.6.5/etc/hadoop# ll core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml 
-rw-rw-r-- 1 cmz  cmz   989 10月 12 11:21 core-site.xml
-rw-rw-r-- 1 cmz  cmz  2335 10月 12 11:11 hdfs-site.xml
-rw-r--r-- 1 root root 1103 10月 12 10:27 mapred-site.xml
-rw-rw-r-- 1 cmz  cmz  1895 10月 12 10:22 yarn-site.xml
core-site.xml
<configuration>
    <!-- 把两个NameNode)的地址组装成一个集群mycluster -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
    </property>

    <!-- 指定hadoop运行时产生文件的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
    <value>/app/hadoop-2.6.5/tmp</value>
    </property>
</configuration>
hdfs-site.xml
<configuration>
    <!-- replication 实验时候配置为1,生产上配置为3 -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <!-- 完全分布式集群名称 -->
    <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
    </property>

    <!-- 集群中NameNode节点都有哪些 -->
    <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
    </property>

    <!-- nn1的RPC通信地址 -->
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>master:8020</value>
    </property>
    <!-- nn2的RPC通信地址 -->
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>slave1:8020</value>
    </property>

    <!-- nn1的http通信地址 -->
    <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>master:50070</value>
    </property>
    <!-- nn2的http通信地址 -->
    <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>slave1:50070</value>
    </property>

    <!-- 指定NameNode元数据在JournalNode上的存放位置 -->
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value>
    </property>

    <!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
    <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <!-- 关闭权限检查-->
    <property>
        <name>dfs.permissions.enable</name>
        <value>false</value>
    </property>

    <!-- 使用隔离机制时需要ssh无秘钥登录-->
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>

    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/root/.ssh/id_rsa</value>
    </property>

    <!-- 声明journalnode服务器存储目录-->
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/app/hadoop-2.7.2/data/jn</value>
    </property>
</configuration>
mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>
</configuration>
yarn-site.xml
<configuration>

<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:8035</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master:8088</value>
    </property>

    <!-- 开启日志聚集功能 -->
    <property>
        <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    </property>

    <!-- 日志保留时间设置为7天 -->
    <property>
    <name>yarn.log-aggreation.retain-seconds</name>
    <value>604800</value>
    </property>
</configuration>

  拷贝配置好的hadoop环境到其他节点.

root@master:~# rsync -avz /app/hadoop-2.6.5 slave1:/app/
root@master:~# rsync -avz /app/hadoop-2.6.5 slave2:/app/

6. 启动HDFS-HA集群

  • 在各个JournalNode节点上,输入以下命令启动journalnode服务
sbin/hadoop-daemon.sh start journalnode
  • 在[nn1]上,对其进行格式化,并启动

    bin/hdfs namenode -format
    sbin/hadoop-daemon.sh start namenode
    

    格式化只执行一次。

  • 在[nn2]上,同步nn1的元数据信息

bin/hdfs namenode -bootstrapStandby

只执行一次。跟格式化一样,只执行一次。此命令相当于格式化,但是要和nn1一致。若是多次格式化的话,nameserver就会有多个id,导致服务起不来。此时删除掉数据重新格式化.

  • 启动[nn2]
sbin/hadoop-daemon.sh start namenode
详细操作
root@master:~# cat /usr/bin/util.sh
for i in root@master root@slave1 root@slave2
do
    echo "------------------------- $i ---------------------"
    ssh $i '/usr/lib/jvm/jdk1.8.0_112/bin/jps'
done
root@master:~# chmod +x /usr/bin/util.sh
root@master:/app/hadoop-2.6.5# ./sbin/hadoop-daemons.sh start journalnode
master: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-master.out
slave1: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-slave1.out
slave2: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-slave2.out

查看集群journalnode启动情况
root@master:/app/hadoop-2.6.5# util.sh 
------------------------- root@master ---------------------
32580 Jps
32330 JournalNode
------------------------- root@slave1 ---------------------
29268 Jps
29132 JournalNode
------------------------- root@slave2 ---------------------
22721 JournalNode
22855 Jps

root@master:/app/hadoop-2.6.5# ./bin/hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

19/10/12 11:22:29 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/192.168.2.20
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.6.5
r:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.6.5.jar:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r e8c9fe0b4c252caf2ebf1464220599650f119997; compiled by 'sjlee' on 2016-10-02T23:43Z
STARTUP_MSG:   java = 1.8.0_112
************************************************************/
19/10/12 11:22:29 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
19/10/12 11:22:29 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-db0aa966-c538-481d-862f-cea4099e8980
19/10/12 11:22:29 INFO namenode.FSNamesystem: No KeyProvider found.
19/10/12 11:22:29 INFO namenode.FSNamesystem: fsLock is fair:true
19/10/12 11:22:29 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
19/10/12 11:22:29 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
19/10/12 11:22:29 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
19/10/12 11:22:29 INFO blockmanagement.BlockManager: The block deletion will start around 2019 Oct 12 11:22:29
19/10/12 11:22:29 INFO util.GSet: Computing capacity for map BlocksMap
19/10/12 11:22:29 INFO util.GSet: VM type       = 64-bit
19/10/12 11:22:29 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
19/10/12 11:22:29 INFO util.GSet: capacity      = 2^21 = 2097152 entries
19/10/12 11:22:29 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
19/10/12 11:22:29 INFO blockmanagement.BlockManager: defaultReplication         = 1
19/10/12 11:22:29 INFO blockmanagement.BlockManager: maxReplication             = 512
19/10/12 11:22:29 INFO blockmanagement.BlockManager: minReplication             = 1
19/10/12 11:22:29 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
19/10/12 11:22:29 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
19/10/12 11:22:29 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
19/10/12 11:22:29 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
19/10/12 11:22:29 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
19/10/12 11:22:29 INFO namenode.FSNamesystem: supergroup          = supergroup
19/10/12 11:22:29 INFO namenode.FSNamesystem: isPermissionEnabled = true
19/10/12 11:22:29 INFO namenode.FSNamesystem: Determined nameservice ID: mycluster
19/10/12 11:22:29 INFO namenode.FSNamesystem: HA Enabled: true
19/10/12 11:22:29 INFO namenode.FSNamesystem: Append Enabled: true
19/10/12 11:22:29 INFO util.GSet: Computing capacity for map INodeMap
19/10/12 11:22:29 INFO util.GSet: VM type       = 64-bit
19/10/12 11:22:29 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
19/10/12 11:22:29 INFO util.GSet: capacity      = 2^20 = 1048576 entries
19/10/12 11:22:29 INFO namenode.NameNode: Caching file names occuring more than 10 times
19/10/12 11:22:29 INFO util.GSet: Computing capacity for map cachedBlocks
19/10/12 11:22:29 INFO util.GSet: VM type       = 64-bit
19/10/12 11:22:29 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
19/10/12 11:22:29 INFO util.GSet: capacity      = 2^18 = 262144 entries
19/10/12 11:22:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
19/10/12 11:22:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
19/10/12 11:22:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
19/10/12 11:22:29 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
19/10/12 11:22:29 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
19/10/12 11:22:29 INFO util.GSet: Computing capacity for map NameNodeRetryCache
19/10/12 11:22:29 INFO util.GSet: VM type       = 64-bit
19/10/12 11:22:29 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
19/10/12 11:22:29 INFO util.GSet: capacity      = 2^15 = 32768 entries
19/10/12 11:22:29 INFO namenode.NNConf: ACLs enabled? false
19/10/12 11:22:29 INFO namenode.NNConf: XAttrs enabled? true
19/10/12 11:22:29 INFO namenode.NNConf: Maximum size of an xattr: 16384
19/10/12 11:22:30 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1468894087-192.168.2.20-1570850550134
19/10/12 11:22:30 INFO common.Storage: Storage directory /app/hadoop-2.6.5/tmp/dfs/name has been successfully formatted.
19/10/12 11:22:30 INFO namenode.FSImageFormatProtobuf: Saving image file /app/hadoop-2.6.5/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
19/10/12 11:22:30 INFO namenode.FSImageFormatProtobuf: Image file /app/hadoop-2.6.5/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
19/10/12 11:22:30 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
19/10/12 11:22:30 INFO util.ExitUtil: Exiting with status 0
19/10/12 11:22:30 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.2.20
************************************************************/

root@master:/app/hadoop-2.6.5# ./sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /app/hadoop-2.6.5/logs/hadoop-root-namenode-master.out
root@master:/app/hadoop-2.6.5# util.sh 
------------------------- root@master ---------------------
401 NameNode
32330 JournalNode
636 Jps
------------------------- root@slave1 ---------------------
29652 Jps
29498 NameNode
29132 JournalNode
------------------------- root@slave2 ---------------------
22721 JournalNode
23096 Jps
root@master:/app/hadoop-2.6.5# sbin/hadoop-daemons.sh start datanode
master: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-master.out
slave1: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-slave1.out
slave2: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-slave2.out
root@master:/app/hadoop-2.6.5# util.sh 
------------------------- root@master ---------------------
784 DataNode
401 NameNode
968 Jps
32330 JournalNode
------------------------- root@slave1 ---------------------
29906 Jps
29498 NameNode
29132 JournalNode
29758 DataNode
------------------------- root@slave2 ---------------------
22721 JournalNode
23189 DataNode
23327 Jps

查看web页面显示

hadoop ha
hadoop ha

 其实master和slave1上的namenode都是stanby,web页面上都显示是 standby 模式

Overview 'master:8020' (standby)
Overview 'slave1:8020' (standby)
命令行查看
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn1
active

root@slave1:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn1
active

此时hdfs文件系统也都不能使用。

hadoop ha

  • 将[nn1]切换为Active
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -transitionToActive nn1
  • 7.查看是nn1否Active

hadoop ha

root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn1
active

7. HDFS-HA 手动

  手动模拟HA.

root@master:/app/hadoop-2.6.5# sbin/hadoop-daemon.sh stop namenode
stopping namenode
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn2
standby
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -transitionToActive nn2
19/10/12 13:49:15 INFO ipc.Client: Retrying connect to server: master/192.168.2.20:8020. Already tried 0 time(s); retry policy is RetryUpToMax
imumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)Unexpected error occurred  Call From master/192.168.2.20 to master:8020 failed on connection exception: java.net.ConnectException: Connection 
refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefusedUsage: HAAdmin [-transitionToActive <serviceId> [--forceactive]]
root@master:/app/hadoop-2.6.5# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /app/hadoop-2.6.5/logs/hadoop-root-namenode-master.out
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -transitionToActive nn2
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn2
active
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn1
standby

在hdfs ha的切换的时候,两个namenode一定要存在,否则切换失败。

8. HDFS-HA 自动

摘要

1.  具体配置
    (1)在hdfs-site.xml中增加
        <property>
            <name>dfs.ha.automatic-failover.enabled</name>
            <value>true</value>
        </property>
    (2)在core-site.xml文件中增加
        <property>
            <name>ha.zookeeper.quorum</name>
            <value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
        </property>
2.  启动
    (1)关闭所有HDFS服务:
        sbin/stop-dfs.sh
    (2)启动Zookeeper集群:
        bin/zkServer.sh start
    (3)初始化HA在Zookeeper中状态:
        bin/hdfs zkfc -formatZK
    (4)启动HDFS服务:
        sbin/start-dfs.sh
    (5)在各个NameNode节点上启动DFSZK Failover           Controller,先在哪台机器启动,哪个机器的NameNode就是Active NameNode
        sbin/hadoop-daemin.sh start zkfc
3.  验证
    (1)将Active NameNode进程kill
        kill -9 namenode的进程id
    (2)将Active NameNode机器断开网络
        service network stop
  • 配置HDFS-HA自动故障转移

在hdfs-site.xml中增加

<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
</property>
??? note "hdfs-site.xml"
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

    <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
    </property>
    <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
    </property>

    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>master:8020</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>slave1:8020</value>
    </property>

    <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>master:50070</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>slave1:50070</value>
    </property>

    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value>
    </property>

    <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <!-- 关闭权限检查-->
    <property>
        <name>dfs.permissions.enable</name>
        <value>false</value>
    </property>

    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>

    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
    </property>

    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
</configuration>

  • 在core-site.xml文件中增加
core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
    </property>

    <property>
        <name>hadoop.tmp.dir</name>
    <value>/app/hadoop-2.6.5/tmp</value>
    </property>

    <property>
        <name>ha.zookeeper.quorum</name>
        <value>master:2181,slave1:2181,slave2:2181</value>
    </property>
</configuration>
  • 启动zk集群
root@master:~# /usr/local/zookeeper/bin/zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@master:~# /usr/local/zookeeper/bin/zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower

root@slave1:/app/hadoop-2.6.5# /usr/local/zookeeper/bin/zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@slave1:/app/hadoop-2.6.5# /usr/local/zookeeper/bin/zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: leader

root@slave2:~# /usr/local/zookeeper/bin/zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@slave2:~# /usr/local/zookeeper/bin/zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
  • 关闭所有HDFS服务:
root@master:~# cd /app/hadoop-2.6.5/
root@master:/app/hadoop-2.6.5# sbigtopgm^C
root@master:/app/hadoop-2.6.5# ./sbin/stop-dfs.sh
Stopping namenodes on [master slave1]
master: stopping namenode
slave1: stopping namenode
master: stopping datanode
slave1: stopping datanode
slave2: stopping datanode
Stopping journal nodes [master slave1 slave2]
master: stopping journalnode
slave1: stopping journalnode
slave2: stopping journalnode
Stopping ZK Failover Controllers on NN hosts [master slave1]
master: no zkfc to stop
slave1: no zkfc to stop
  • 初始化HA在Zookeeper中状态
bin/hdfs zkfc -formatZK
初始化zk
root@master:/app/hadoop-2.6.5# bin/hdfs zkfc -formatZK
19/10/12 14:32:42 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at master/192.168.2.20:8020
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:host.name=master
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_112
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/jdk1.8.0_112/jre
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/app/hadoop-2.6.5/etc/hadoop:/app/hadoop-2.6.5/share/hadoop/com
mon/lib/activation-1.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jetty-util-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/hadoop-annotations-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/netty-3.6.2.Final.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jersey-core-1.9.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jetty-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-compress-1.4.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/curator-client-2.6.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/paranamer-2.3.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-net-3.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/curator-recipes-2.6.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-io-2.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jsp-api-2.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/avro-1.7.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jets3t-0.9.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-el-1.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-math3-3.1.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-configuration-1.6.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/junit-4.11.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/stax-api-1.0-2.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/asm-3.2.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/mockito-all-1.8.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jsch-0.1.42.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/servlet-api-2.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/curator-framework-2.6.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/httpclient-4.2.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/guava-11.0.2.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jersey-server-1.9.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/zookeeper-3.4.6.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/xz-1.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-digester-1.8.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/htrace-core-3.0.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-lang-2.6.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/log4j-1.2.17.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-codec-1.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-logging-1.1.3.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/gson-2.2.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jersey-json-1.9.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jsr305-1.3.9.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/hamcrest-core-1.3.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/xmlenc-0.52.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/hadoop-auth-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/httpcore-4.2.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jettison-1.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-httpclient-3.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-cli-1.2.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-collections-3.2.2.jar:/app/hadoop-2.6.5/share/hadoop/common/hadoop-common-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/common/hadoop-common-2.6.5-tests.jar:/app/hadoop-2.6.5/share/hadoop/common/hadoop-nfs-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/hdfs:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-io-2.4.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-el-1.0.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/asm-3.2.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/guava-11.0.2.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/htrace-core-3.0.4.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/hadoop-hdfs-nfs-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/hadoop-hdfs-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/hadoop-hdfs-2.6.5-tests.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/activation-1.1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jersey-core-1.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jetty-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-io-2.4.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jline-0.9.94.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/aopalliance-1.0.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/javax.inject-1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/asm-3.2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/servlet-api-2.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/guava-11.0.2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jersey-server-1.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/xz-1.0.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-lang-2.6.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/log4j-1.2.17.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-codec-1.4.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jersey-json-1.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jsr305-1.3.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/guice-3.0.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jettison-1.1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-httpclient-3.1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jersey-client-1.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-cli-1.2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-common-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-registry-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-tests-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-common-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-client-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-api-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/hadoop-annotations-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/javax.inject-1.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/junit-4.11.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/asm-3.2.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/xz-1.0.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/guice-3.0.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.5-tests.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.6.5.jar:/contrib/capacity-scheduler/*.jar19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/app/hadoop-2.6.5/lib/native
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:os.version=5.0.0-23-generic
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:user.name=root
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:user.dir=/app/hadoop-2.6.5
19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,slave1:2181,slave2:2181 sessionTimeout=500
0 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@7784771819/10/12 14:32:42 INFO zookeeper.ClientCnxn: Opening socket connection to server slave2/192.168.2.22:2181. Will not attempt to authenticate us
ing SASL (unknown error)19/10/12 14:32:42 INFO zookeeper.ClientCnxn: Socket connection established to slave2/192.168.2.22:2181, initiating session
19/10/12 14:32:43 INFO zookeeper.ClientCnxn: Session establishment complete on server slave2/192.168.2.22:2181, sessionid = 0x36dbea61b6c0000,
 negotiated timeout = 500019/10/12 14:32:43 INFO ha.ActiveStandbyElector: Session connected.
19/10/12 14:32:43 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
19/10/12 14:32:43 INFO zookeeper.ZooKeeper: Session: 0x36dbea61b6c0000 closed
19/10/12 14:32:43 INFO zookeeper.ClientCnxn: EventThread shut down

查看zk

root@master:~# /usr/local/zookeeper/bin/zkCl
zkCleanup.sh  zkCli.cmd     zkCli.sh      
root@master:~# /usr/local/zookeeper/bin/zkCli.sh 
Connecting to localhost:2181
2019-10-12 14:34:27,526 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52
 GMT2019-10-12 14:34:27,528 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=master
2019-10-12 14:34:27,528 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_112
2019-10-12 14:34:27,528 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2019-10-12 14:34:27,528 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/jdk1.8.0_112/jre
2019-10-12 14:34:27,528 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/usr/local/zookeeper/bin/../build/classes:
/usr/local/zookeeper/bin/../build/lib/*.jar:/usr/local/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/local/zookeeper/bin/../lib/log4j-1.2.15.jar:/usr/local/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper/bin/../zookeeper-3.4.5.jar:/usr/local/zookeeper/bin/../src/java/lib/*.jar:/usr/local/zookeeper/bin/../conf:.:/usr/lib/jvm/jdk1.8.0_112/lib:/usr/lib/jvm/jdk1.8.0_112/jre/lib2019-10-12 14:34:27,529 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:
/lib64:/lib:/usr/lib2019-10-12 14:34:27,529 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2019-10-12 14:34:27,529 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2019-10-12 14:34:27,529 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2019-10-12 14:34:27,529 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2019-10-12 14:34:27,529 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=5.0.0-23-generic
2019-10-12 14:34:27,529 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=root
2019-10-12 14:34:27,529 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/root
2019-10-12 14:34:27,530 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/root
2019-10-12 14:34:27,530 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000
 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@42110406Welcome to ZooKeeper!
2019-10-12 14:34:27,559 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@966] - Opening socket connection to server loca
lhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)JLine support is enabled
2019-10-12 14:34:27,595 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@849] - Socket connection established to localho
st/127.0.0.1:2181, initiating session2019-10-12 14:34:27,629 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1207] - Session establishment complete on serve
r localhost/127.0.0.1:2181, sessionid = 0x16dbea616f50000, negotiated timeout = 30000
WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[cluster, storm, brokers, zookeeper, hadoop-ha, admin, isr_change_notification, log_dir_event_notification, controller_epoch, name, consumers,
 latest_producer_id_block, config, hbase][zk: localhost:2181(CONNECTED) 1] ls /hadoop-ha
[mycluster]
[zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha/mycluster
[]

  • 分发
rsync -avz /app/hadoop-2.6.5/etc/hadoop/* slave1:/app/hadoop-2.6.5/etc/hadoop/
rsync -avz /app/hadoop-2.6.5/etc/hadoop/* slave2:/app/hadoop-2.6.5/etc/hadoop/
分发修改后的配置文件
root@master:/app/hadoop-2.6.5# rsync -avz /app/hadoop-2.6.5/etc/hadoop/* slave1:/app/hadoop-2.6.5/etc/hadoop/
sending incremental file list
core-site.xml
hdfs-site.xml

sent 785 bytes  received 90 bytes  1,750.00 bytes/sec
total size is 80,979  speedup is 92.55

root@master:/app/hadoop-2.6.5# rsync -avz /app/hadoop-2.6.5/etc/hadoop/* slave2:/app/hadoop-2.6.5/etc/hadoop/
sending incremental file list
core-site.xml
hdfs-site.xml

sent 1,067 bytes  received 90 bytes  2,314.00 bytes/sec
total size is 80,979  speedup is 69.99
  • 启动HDFS
sbin/start-dfs.sh
启动HDFS
root@master:/app/hadoop-2.6.5# sbin/start-dfs.sh
Starting namenodes on [master slave1]
master: starting namenode, logging to /app/hadoop-2.6.5/logs/hadoop-root-namenode-master.out
slave1: starting namenode, logging to /app/hadoop-2.6.5/logs/hadoop-root-namenode-slave1.out
master: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-master.out
slave1: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-slave1.out
slave2: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-slave2.out
Starting journal nodes [master slave1 slave2]
master: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-master.out
slave1: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-slave1.out
slave2: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-slave2.out
Starting ZK Failover Controllers on NN hosts [master slave1]
master: starting zkfc, logging to /app/hadoop-2.6.5/logs/hadoop-root-zkfc-master.out
slave1: starting zkfc, logging to /app/hadoop-2.6.5/logs/hadoop-root-zkfc-slave1.out
  • 检查
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn1
active
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn2
standby

root@master:/app/hadoop-2.6.5# util.sh 
------------------------- root@master ---------------------
8401 DFSZKFailoverController
7890 DataNode
8628 Jps
8153 JournalNode
4938 QuorumPeerMain
7726 NameNode
------------------------- root@slave1 ---------------------
2466 Jps
1892 NameNode
2020 DataNode
2165 JournalNode
2341 DFSZKFailoverController
32078 QuorumPeerMain
------------------------- root@slave2 ---------------------
23977 QuorumPeerMain
25274 Jps
25002 DataNode
25147 JournalNode
  • 测试

  此时主namenode在master上。我kill掉master上的主namenode,看看会不会自动故障转移。