Hadoop HA
鸡汤:要等到时间对 等大雁向南飞,我暖着我的玫瑰不让她枯萎
1. 概述¶
- 1)所谓HA(High Available),即高可用(7*24小时不中断服务)。
- 2)实现高可用最关键的策略是消除单点故障。HA严格来说应该分成各个组件的HA机制: HDFS的HA和YARN的HA
- 3)Hadoop2.0之前,在HDFS集群中NameNode存在单点故障(SPOF)。
- 4)NameNode主要在以下两个方面影响HDFS集群
NameNode机器发生意外,如宕机,集群将无法使用,直到管理员重启 NameNode机器需要升级,包括软件、硬件升级,此时集群也将无法使用 HDFS HA功能通过配置Active/Standby两个NameNodes实现在集群中对NameNode的热备来解决上述问题。 如果出现故障,如机器崩溃或机器需要升级维护,这时可通过此种方式将NameNode很快的切换到另外一台机器。
2. HDFS-HA工作机制¶
2.1 HDFS-HA工作要点¶
- 元数据管理方式需要改变
内存中各自保存一份元数据; Edits日志只有Active状态的NameNode节点可以做写操作; 两个NameNode都可以读取Edits; 共享的Edits放在一个共享存储中管理(qjournal和NFS两个主流实现)
- 需要一个状态管理功能模块
实现了一个zkfailover,常驻在每一个namenode所在的节点,每一个zkfailover负责监控自己所在NameNode节点,利用zk进行状态标识,当需要进行状态切换时,由zkfailover来负责切换,切换时需要防止brain split现象的发生。
- 必须保证两个NameNode之间能够ssh无密码登录
- 隔离(Fence),即同一时刻仅仅有一个NameNode对外提供服务
2.2 HDFS-HA自动故障转移工作机制¶
命令hdfs haadmin -failover手动进行故障转移,在该模式下,即使现役NameNode已经失效,系统也不会自动从现役NameNode转移到待机NameNode,下面学习如何配置部署HA自动进行故障转移。自动故障转移为HDFS部署增加了两个新组件:ZooKeeper和ZKFailoverController(ZKFC)进程,如图3-20所示。ZooKeeper是维护少量协调数据,通知客户端这些数据的改变和监视客户端故障的高可用服务。HA的自动故障转移依赖于ZooKeeper的以下功能:
- 1)故障检测:集群中的每个NameNode在ZooKeeper中维护了一个持久会话,如果机器崩溃,ZooKeeper中的会话将终止,ZooKeeper通知另一个NameNode需要触发故障转移。
- 2)现役NameNode选择:ZooKeeper提供了一个简单的机制用于唯一的选择一个节点为active状态。如果目前现役NameNode崩溃,另一个节点可能从ZooKeeper获得特殊的排外锁以表明它应该成为现役NameNode。
ZKFC是自动故障转移中的另一个新组件,是ZooKeeper的客户端,也监视和管理NameNode的状态。每个运行NameNode的主机也运行了一个ZKFC进程,ZKFC负责:
1)健康监测:ZKFC使用一个健康检查命令定期地ping与之在相同主机的NameNode, 只要该NameNode及时地回复健康状态,ZKFC认为该节点是健康的。如果该节点崩溃, 冻结或进入不健康状态,健康监测器标识该节点为非健康的。 2)ZooKeeper会话管理:当本地NameNode是健康的,ZKFC保持一个在ZooKeeper中打开的会话。 如果本地NameNode处于active状态,ZKFC也保持一个特殊的znode锁, 该锁使用了ZooKeeper对短暂节点的支持,如果会话终止,锁节点将自动删除。 3)基于ZooKeeper的选择:如果本地NameNode是健康的,且ZKFC发现没有其它的节点当前持有znode锁, 它将为自己获取该锁。如果成功,则它已经赢得了选择, 并负责运行故障转移进程以使它的本地NameNode为Active。 故障转移进程与前面描述的手动故障转移相似,首先如果必要保护之前的现役NameNode, 然后本地NameNode转换为Active状态。

3. HDFS-HA集群配置¶
IP | 主机名 | 软件软件 |
---|---|---|
192.168.2.20 | master | NameNode,JournalNode,DataNode,ZK,NodeManager |
192.168.2.21 | slave1 | NameNode,JournalNode,DataNode,ZK,ResourceManager,NodeManager |
192.168.2.22 | slave2 | JournalNode,DataNode,ZK,NodeManager |
4. 配置Zookeeper集群¶
省略,参考之前的zk
5. 配置HDFS-HA集群¶
新部署环境
root@master:~# mkdir /app root@slave1:~# mkdir /app root@slave2:~# mkdir /app root@master:~# tar xf /usr/local/src/hadoop-2.6.5.tar.gz -C /app/ root@master:~# mkdir /app/hadoop-2.6.5/dfs/{name,data} root@master:~# mkdir /app/hadoop-2.6.5/data root@master:~# mkdir /app/hadoop-2.6.5/tmp root@master:~# cd /app/hadoop-2.6.5/etc/hadoop/ root@master:/app/hadoop-2.6.5/etc/hadoop# ls *.sh hadoop-env.sh httpfs-env.sh kms-env.sh mapred-env.sh yarn-env.sh
配置JAVA_HOME
root@master:/app/hadoop-2.6.5/etc/hadoop# which java /usr/lib/jvm/jdk1.8.0_112/bin/java root@master:/app/hadoop-2.6.5/etc/hadoop# find ./ -name "*.sh" | xargs grep -irn jdk1.8.0_112 ./yarn-env.sh:23:export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_112 ./yarn-env.sh:26: JAVA_HOME=/usr/lib/jvm/jdk1.8.0_112 ./hadoop-env.sh:25:export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_112 ./mapred-env.sh:16:export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_112
修改配置文件
root@master:/app/hadoop-2.6.5/etc/hadoop# ll core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml -rw-rw-r-- 1 cmz cmz 989 10月 12 11:21 core-site.xml -rw-rw-r-- 1 cmz cmz 2335 10月 12 11:11 hdfs-site.xml -rw-r--r-- 1 root root 1103 10月 12 10:27 mapred-site.xml -rw-rw-r-- 1 cmz cmz 1895 10月 12 10:22 yarn-site.xml
core-site.xml
<configuration> <!-- 把两个NameNode)的地址组装成一个集群mycluster --> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <!-- 指定hadoop运行时产生文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop-2.6.5/tmp</value> </property> </configuration>
hdfs-site.xml
<configuration> <!-- replication 实验时候配置为1,生产上配置为3 --> <property> <name>dfs.replication</name> <value>1</value> </property> <!-- 完全分布式集群名称 --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <!-- 集群中NameNode节点都有哪些 --> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <!-- nn1的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>master:8020</value> </property> <!-- nn2的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>slave1:8020</value> </property> <!-- nn1的http通信地址 --> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>master:50070</value> </property> <!-- nn2的http通信地址 --> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>slave1:50070</value> </property> <!-- 指定NameNode元数据在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value> </property> <!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 --> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 关闭权限检查--> <property> <name>dfs.permissions.enable</name> <value>false</value> </property> <!-- 使用隔离机制时需要ssh无秘钥登录--> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/root/.ssh/id_rsa</value> </property> <!-- 声明journalnode服务器存储目录--> <property> <name>dfs.journalnode.edits.dir</name> <value>/app/hadoop-2.7.2/data/jn</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <!-- 开启日志聚集功能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 日志保留时间设置为7天 --> <property> <name>yarn.log-aggreation.retain-seconds</name> <value>604800</value> </property> </configuration>
拷贝配置好的hadoop环境到其他节点.
root@master:~# rsync -avz /app/hadoop-2.6.5 slave1:/app/ root@master:~# rsync -avz /app/hadoop-2.6.5 slave2:/app/
6. 启动HDFS-HA集群¶
- 在各个JournalNode节点上,输入以下命令启动journalnode服务
sbin/hadoop-daemon.sh start journalnode
-
在[nn1]上,对其进行格式化,并启动
bin/hdfs namenode -format sbin/hadoop-daemon.sh start namenode
格式化只执行一次。
-
在[nn2]上,同步nn1的元数据信息
bin/hdfs namenode -bootstrapStandby
只执行一次。跟格式化一样,只执行一次。此命令相当于格式化,但是要和nn1一致。若是多次格式化的话,nameserver就会有多个id,导致服务起不来。此时删除掉数据重新格式化.
- 启动[nn2]
sbin/hadoop-daemon.sh start namenode
详细操作
root@master:~# cat /usr/bin/util.sh for i in root@master root@slave1 root@slave2 do echo "------------------------- $i ---------------------" ssh $i '/usr/lib/jvm/jdk1.8.0_112/bin/jps' done root@master:~# chmod +x /usr/bin/util.sh root@master:/app/hadoop-2.6.5# ./sbin/hadoop-daemons.sh start journalnode master: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-master.out slave1: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-slave1.out slave2: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-slave2.out 查看集群journalnode启动情况 root@master:/app/hadoop-2.6.5# util.sh ------------------------- root@master --------------------- 32580 Jps 32330 JournalNode ------------------------- root@slave1 --------------------- 29268 Jps 29132 JournalNode ------------------------- root@slave2 --------------------- 22721 JournalNode 22855 Jps root@master:/app/hadoop-2.6.5# ./bin/hadoop namenode -format DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 19/10/12 11:22:29 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = master/192.168.2.20 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.6.5 r:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.6.5.jar:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar STARTUP_MSG: build = https://github.com/apache/hadoop.git -r e8c9fe0b4c252caf2ebf1464220599650f119997; compiled by 'sjlee' on 2016-10-02T23:43Z STARTUP_MSG: java = 1.8.0_112 ************************************************************/ 19/10/12 11:22:29 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 19/10/12 11:22:29 INFO namenode.NameNode: createNameNode [-format] Formatting using clusterid: CID-db0aa966-c538-481d-862f-cea4099e8980 19/10/12 11:22:29 INFO namenode.FSNamesystem: No KeyProvider found. 19/10/12 11:22:29 INFO namenode.FSNamesystem: fsLock is fair:true 19/10/12 11:22:29 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000 19/10/12 11:22:29 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true 19/10/12 11:22:29 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000 19/10/12 11:22:29 INFO blockmanagement.BlockManager: The block deletion will start around 2019 Oct 12 11:22:29 19/10/12 11:22:29 INFO util.GSet: Computing capacity for map BlocksMap 19/10/12 11:22:29 INFO util.GSet: VM type = 64-bit 19/10/12 11:22:29 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB 19/10/12 11:22:29 INFO util.GSet: capacity = 2^21 = 2097152 entries 19/10/12 11:22:29 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false 19/10/12 11:22:29 INFO blockmanagement.BlockManager: defaultReplication = 1 19/10/12 11:22:29 INFO blockmanagement.BlockManager: maxReplication = 512 19/10/12 11:22:29 INFO blockmanagement.BlockManager: minReplication = 1 19/10/12 11:22:29 INFO blockmanagement.BlockManager: maxReplicationStreams = 2 19/10/12 11:22:29 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000 19/10/12 11:22:29 INFO blockmanagement.BlockManager: encryptDataTransfer = false 19/10/12 11:22:29 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000 19/10/12 11:22:29 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE) 19/10/12 11:22:29 INFO namenode.FSNamesystem: supergroup = supergroup 19/10/12 11:22:29 INFO namenode.FSNamesystem: isPermissionEnabled = true 19/10/12 11:22:29 INFO namenode.FSNamesystem: Determined nameservice ID: mycluster 19/10/12 11:22:29 INFO namenode.FSNamesystem: HA Enabled: true 19/10/12 11:22:29 INFO namenode.FSNamesystem: Append Enabled: true 19/10/12 11:22:29 INFO util.GSet: Computing capacity for map INodeMap 19/10/12 11:22:29 INFO util.GSet: VM type = 64-bit 19/10/12 11:22:29 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB 19/10/12 11:22:29 INFO util.GSet: capacity = 2^20 = 1048576 entries 19/10/12 11:22:29 INFO namenode.NameNode: Caching file names occuring more than 10 times 19/10/12 11:22:29 INFO util.GSet: Computing capacity for map cachedBlocks 19/10/12 11:22:29 INFO util.GSet: VM type = 64-bit 19/10/12 11:22:29 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB 19/10/12 11:22:29 INFO util.GSet: capacity = 2^18 = 262144 entries 19/10/12 11:22:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033 19/10/12 11:22:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 19/10/12 11:22:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000 19/10/12 11:22:29 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 19/10/12 11:22:29 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 19/10/12 11:22:29 INFO util.GSet: Computing capacity for map NameNodeRetryCache 19/10/12 11:22:29 INFO util.GSet: VM type = 64-bit 19/10/12 11:22:29 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB 19/10/12 11:22:29 INFO util.GSet: capacity = 2^15 = 32768 entries 19/10/12 11:22:29 INFO namenode.NNConf: ACLs enabled? false 19/10/12 11:22:29 INFO namenode.NNConf: XAttrs enabled? true 19/10/12 11:22:29 INFO namenode.NNConf: Maximum size of an xattr: 16384 19/10/12 11:22:30 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1468894087-192.168.2.20-1570850550134 19/10/12 11:22:30 INFO common.Storage: Storage directory /app/hadoop-2.6.5/tmp/dfs/name has been successfully formatted. 19/10/12 11:22:30 INFO namenode.FSImageFormatProtobuf: Saving image file /app/hadoop-2.6.5/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression 19/10/12 11:22:30 INFO namenode.FSImageFormatProtobuf: Image file /app/hadoop-2.6.5/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds. 19/10/12 11:22:30 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 19/10/12 11:22:30 INFO util.ExitUtil: Exiting with status 0 19/10/12 11:22:30 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master/192.168.2.20 ************************************************************/ root@master:/app/hadoop-2.6.5# ./sbin/hadoop-daemon.sh start namenode starting namenode, logging to /app/hadoop-2.6.5/logs/hadoop-root-namenode-master.out root@master:/app/hadoop-2.6.5# util.sh ------------------------- root@master --------------------- 401 NameNode 32330 JournalNode 636 Jps ------------------------- root@slave1 --------------------- 29652 Jps 29498 NameNode 29132 JournalNode ------------------------- root@slave2 --------------------- 22721 JournalNode 23096 Jps root@master:/app/hadoop-2.6.5# sbin/hadoop-daemons.sh start datanode master: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-master.out slave1: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-slave1.out slave2: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-slave2.out root@master:/app/hadoop-2.6.5# util.sh ------------------------- root@master --------------------- 784 DataNode 401 NameNode 968 Jps 32330 JournalNode ------------------------- root@slave1 --------------------- 29906 Jps 29498 NameNode 29132 JournalNode 29758 DataNode ------------------------- root@slave2 --------------------- 22721 JournalNode 23189 DataNode 23327 Jps
查看web页面显示


其实master和slave1上的namenode都是stanby,web页面上都显示是 standby 模式
Overview 'master:8020' (standby) Overview 'slave1:8020' (standby)
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn1 active root@slave1:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn1 active
此时hdfs文件系统也都不能使用。
- 将[nn1]切换为Active
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -transitionToActive nn1
- 7.查看是nn1否Active

root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn1 active
7. HDFS-HA 手动¶
手动模拟HA.
root@master:/app/hadoop-2.6.5# sbin/hadoop-daemon.sh stop namenode stopping namenode root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn2 standby root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -transitionToActive nn2 19/10/12 13:49:15 INFO ipc.Client: Retrying connect to server: master/192.168.2.20:8020. Already tried 0 time(s); retry policy is RetryUpToMax imumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)Unexpected error occurred Call From master/192.168.2.20 to master:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefusedUsage: HAAdmin [-transitionToActive <serviceId> [--forceactive]] root@master:/app/hadoop-2.6.5# sbin/hadoop-daemon.sh start namenode starting namenode, logging to /app/hadoop-2.6.5/logs/hadoop-root-namenode-master.out root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -transitionToActive nn2 root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn2 active root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn1 standby
在hdfs ha的切换的时候,两个namenode一定要存在,否则切换失败。
8. HDFS-HA 自动¶
摘要
1. 具体配置 (1)在hdfs-site.xml中增加 <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> (2)在core-site.xml文件中增加 <property> <name>ha.zookeeper.quorum</name> <value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value> </property> 2. 启动 (1)关闭所有HDFS服务: sbin/stop-dfs.sh (2)启动Zookeeper集群: bin/zkServer.sh start (3)初始化HA在Zookeeper中状态: bin/hdfs zkfc -formatZK (4)启动HDFS服务: sbin/start-dfs.sh (5)在各个NameNode节点上启动DFSZK Failover Controller,先在哪台机器启动,哪个机器的NameNode就是Active NameNode sbin/hadoop-daemin.sh start zkfc 3. 验证 (1)将Active NameNode进程kill kill -9 namenode的进程id (2)将Active NameNode机器断开网络 service network stop
- 配置HDFS-HA自动故障转移
在hdfs-site.xml中增加
<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>master:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>slave1:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>master:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>slave1:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 关闭权限检查--> <property> <name>dfs.permissions.enable</name> <value>false</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> </configuration>
- 在core-site.xml文件中增加
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop-2.6.5/tmp</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>master:2181,slave1:2181,slave2:2181</value> </property> </configuration>
- 启动zk集群
root@master:~# /usr/local/zookeeper/bin/zkServer.sh start JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED root@master:~# /usr/local/zookeeper/bin/zkServer.sh status JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Mode: follower root@slave1:/app/hadoop-2.6.5# /usr/local/zookeeper/bin/zkServer.sh start JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED root@slave1:/app/hadoop-2.6.5# /usr/local/zookeeper/bin/zkServer.sh status JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Mode: leader root@slave2:~# /usr/local/zookeeper/bin/zkServer.sh start JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED root@slave2:~# /usr/local/zookeeper/bin/zkServer.sh status JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Mode: follower
- 关闭所有HDFS服务:
root@master:~# cd /app/hadoop-2.6.5/ root@master:/app/hadoop-2.6.5# sbigtopgm^C root@master:/app/hadoop-2.6.5# ./sbin/stop-dfs.sh Stopping namenodes on [master slave1] master: stopping namenode slave1: stopping namenode master: stopping datanode slave1: stopping datanode slave2: stopping datanode Stopping journal nodes [master slave1 slave2] master: stopping journalnode slave1: stopping journalnode slave2: stopping journalnode Stopping ZK Failover Controllers on NN hosts [master slave1] master: no zkfc to stop slave1: no zkfc to stop
- 初始化HA在Zookeeper中状态
bin/hdfs zkfc -formatZK
初始化zk
root@master:/app/hadoop-2.6.5# bin/hdfs zkfc -formatZK 19/10/12 14:32:42 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at master/192.168.2.20:8020 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:host.name=master 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_112 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/jdk1.8.0_112/jre 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/app/hadoop-2.6.5/etc/hadoop:/app/hadoop-2.6.5/share/hadoop/com mon/lib/activation-1.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jetty-util-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/hadoop-annotations-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/netty-3.6.2.Final.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jersey-core-1.9.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jetty-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-compress-1.4.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/curator-client-2.6.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/paranamer-2.3.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-net-3.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/curator-recipes-2.6.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-io-2.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jsp-api-2.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/avro-1.7.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jets3t-0.9.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-el-1.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-math3-3.1.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-configuration-1.6.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/junit-4.11.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/stax-api-1.0-2.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/asm-3.2.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/mockito-all-1.8.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jsch-0.1.42.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/servlet-api-2.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/curator-framework-2.6.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/httpclient-4.2.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/guava-11.0.2.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jersey-server-1.9.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/zookeeper-3.4.6.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/xz-1.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-digester-1.8.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/htrace-core-3.0.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-lang-2.6.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/log4j-1.2.17.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-codec-1.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-logging-1.1.3.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/gson-2.2.4.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jersey-json-1.9.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jsr305-1.3.9.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/hamcrest-core-1.3.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/xmlenc-0.52.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/hadoop-auth-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/httpcore-4.2.5.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/jettison-1.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-httpclient-3.1.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-cli-1.2.jar:/app/hadoop-2.6.5/share/hadoop/common/lib/commons-collections-3.2.2.jar:/app/hadoop-2.6.5/share/hadoop/common/hadoop-common-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/common/hadoop-common-2.6.5-tests.jar:/app/hadoop-2.6.5/share/hadoop/common/hadoop-nfs-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/hdfs:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-io-2.4.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-el-1.0.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/asm-3.2.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/guava-11.0.2.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/htrace-core-3.0.4.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/hadoop-hdfs-nfs-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/hadoop-hdfs-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/hdfs/hadoop-hdfs-2.6.5-tests.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/activation-1.1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jersey-core-1.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jetty-6.1.26.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-io-2.4.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jline-0.9.94.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/aopalliance-1.0.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/javax.inject-1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/asm-3.2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/servlet-api-2.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/guava-11.0.2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jersey-server-1.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/xz-1.0.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-lang-2.6.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/log4j-1.2.17.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-codec-1.4.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jersey-json-1.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jsr305-1.3.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/guice-3.0.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jettison-1.1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-httpclient-3.1.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/jersey-client-1.9.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-cli-1.2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-common-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-registry-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-tests-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-common-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-client-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/yarn/hadoop-yarn-api-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/hadoop-annotations-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/javax.inject-1.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/junit-4.11.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/asm-3.2.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/xz-1.0.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/guice-3.0.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.5-tests.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.5.jar:/app/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.6.5.jar:/contrib/capacity-scheduler/*.jar19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/app/hadoop-2.6.5/lib/native 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:os.version=5.0.0-23-generic 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:user.name=root 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:user.home=/root 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Client environment:user.dir=/app/hadoop-2.6.5 19/10/12 14:32:42 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,slave1:2181,slave2:2181 sessionTimeout=500 0 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@7784771819/10/12 14:32:42 INFO zookeeper.ClientCnxn: Opening socket connection to server slave2/192.168.2.22:2181. Will not attempt to authenticate us ing SASL (unknown error)19/10/12 14:32:42 INFO zookeeper.ClientCnxn: Socket connection established to slave2/192.168.2.22:2181, initiating session 19/10/12 14:32:43 INFO zookeeper.ClientCnxn: Session establishment complete on server slave2/192.168.2.22:2181, sessionid = 0x36dbea61b6c0000, negotiated timeout = 500019/10/12 14:32:43 INFO ha.ActiveStandbyElector: Session connected. 19/10/12 14:32:43 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK. 19/10/12 14:32:43 INFO zookeeper.ZooKeeper: Session: 0x36dbea61b6c0000 closed 19/10/12 14:32:43 INFO zookeeper.ClientCnxn: EventThread shut down
查看zk
root@master:~# /usr/local/zookeeper/bin/zkCl zkCleanup.sh zkCli.cmd zkCli.sh root@master:~# /usr/local/zookeeper/bin/zkCli.sh Connecting to localhost:2181 2019-10-12 14:34:27,526 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT2019-10-12 14:34:27,528 [myid:] - INFO [main:Environment@100] - Client environment:host.name=master 2019-10-12 14:34:27,528 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_112 2019-10-12 14:34:27,528 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2019-10-12 14:34:27,528 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/jdk1.8.0_112/jre 2019-10-12 14:34:27,528 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/usr/local/zookeeper/bin/../build/classes: /usr/local/zookeeper/bin/../build/lib/*.jar:/usr/local/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/local/zookeeper/bin/../lib/log4j-1.2.15.jar:/usr/local/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper/bin/../zookeeper-3.4.5.jar:/usr/local/zookeeper/bin/../src/java/lib/*.jar:/usr/local/zookeeper/bin/../conf:.:/usr/lib/jvm/jdk1.8.0_112/lib:/usr/lib/jvm/jdk1.8.0_112/jre/lib2019-10-12 14:34:27,529 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64: /lib64:/lib:/usr/lib2019-10-12 14:34:27,529 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2019-10-12 14:34:27,529 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2019-10-12 14:34:27,529 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2019-10-12 14:34:27,529 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2019-10-12 14:34:27,529 [myid:] - INFO [main:Environment@100] - Client environment:os.version=5.0.0-23-generic 2019-10-12 14:34:27,529 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root 2019-10-12 14:34:27,529 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root 2019-10-12 14:34:27,530 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/root 2019-10-12 14:34:27,530 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@42110406Welcome to ZooKeeper! 2019-10-12 14:34:27,559 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@966] - Opening socket connection to server loca lhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)JLine support is enabled 2019-10-12 14:34:27,595 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@849] - Socket connection established to localho st/127.0.0.1:2181, initiating session2019-10-12 14:34:27,629 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1207] - Session establishment complete on serve r localhost/127.0.0.1:2181, sessionid = 0x16dbea616f50000, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181(CONNECTED) 0] ls / [cluster, storm, brokers, zookeeper, hadoop-ha, admin, isr_change_notification, log_dir_event_notification, controller_epoch, name, consumers, latest_producer_id_block, config, hbase][zk: localhost:2181(CONNECTED) 1] ls /hadoop-ha [mycluster] [zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha/mycluster []
- 分发
rsync -avz /app/hadoop-2.6.5/etc/hadoop/* slave1:/app/hadoop-2.6.5/etc/hadoop/ rsync -avz /app/hadoop-2.6.5/etc/hadoop/* slave2:/app/hadoop-2.6.5/etc/hadoop/
分发修改后的配置文件
root@master:/app/hadoop-2.6.5# rsync -avz /app/hadoop-2.6.5/etc/hadoop/* slave1:/app/hadoop-2.6.5/etc/hadoop/ sending incremental file list core-site.xml hdfs-site.xml sent 785 bytes received 90 bytes 1,750.00 bytes/sec total size is 80,979 speedup is 92.55 root@master:/app/hadoop-2.6.5# rsync -avz /app/hadoop-2.6.5/etc/hadoop/* slave2:/app/hadoop-2.6.5/etc/hadoop/ sending incremental file list core-site.xml hdfs-site.xml sent 1,067 bytes received 90 bytes 2,314.00 bytes/sec total size is 80,979 speedup is 69.99
- 启动HDFS
sbin/start-dfs.sh
启动HDFS
root@master:/app/hadoop-2.6.5# sbin/start-dfs.sh Starting namenodes on [master slave1] master: starting namenode, logging to /app/hadoop-2.6.5/logs/hadoop-root-namenode-master.out slave1: starting namenode, logging to /app/hadoop-2.6.5/logs/hadoop-root-namenode-slave1.out master: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-master.out slave1: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-slave1.out slave2: starting datanode, logging to /app/hadoop-2.6.5/logs/hadoop-root-datanode-slave2.out Starting journal nodes [master slave1 slave2] master: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-master.out slave1: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-slave1.out slave2: starting journalnode, logging to /app/hadoop-2.6.5/logs/hadoop-root-journalnode-slave2.out Starting ZK Failover Controllers on NN hosts [master slave1] master: starting zkfc, logging to /app/hadoop-2.6.5/logs/hadoop-root-zkfc-master.out slave1: starting zkfc, logging to /app/hadoop-2.6.5/logs/hadoop-root-zkfc-slave1.out
- 检查
root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn1 active root@master:/app/hadoop-2.6.5# bin/hdfs haadmin -getServiceState nn2 standby root@master:/app/hadoop-2.6.5# util.sh ------------------------- root@master --------------------- 8401 DFSZKFailoverController 7890 DataNode 8628 Jps 8153 JournalNode 4938 QuorumPeerMain 7726 NameNode ------------------------- root@slave1 --------------------- 2466 Jps 1892 NameNode 2020 DataNode 2165 JournalNode 2341 DFSZKFailoverController 32078 QuorumPeerMain ------------------------- root@slave2 --------------------- 23977 QuorumPeerMain 25274 Jps 25002 DataNode 25147 JournalNode
- 测试
此时主namenode在master上。我kill掉master上的主namenode,看看会不会自动故障转移。