Hadoop高可用ZKFC节点异常退出
最近公司的一个分析系统,Hadoop ZKFC经常异常退出,具体日志情况如下:
2020-08-26 14:30:14,455 INFO org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.100.232.31:33989, server: hadoop-dn1/10.100.232.35:2181
2020-08-26 14:30:14,457 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop-dn1/10.100.232.35:2181, sessionid = 0x46dc45ca67201f1, negotiated timeout = 5000
2020-08-26 14:30:14,459 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2020-08-26 14:30:17,792 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 3335ms for sessionid 0x46dc45ca67201f1, closing socket connection and attempting reconnect
2020-08-26 14:30:17,899 FATAL org.apache.hadoop.ha.ActiveStandbyElector: Received stat error from Zookeeper. code:CONNECTIONLOSS. Not retrying further znode monitoring connection errors.
2020-08-26 14:30:18,179 INFO org.apache.zookeeper.ZooKeeper: Session: 0x46dc45ca67201f1 closed
2020-08-26 14:30:18,180 FATAL org.apache.hadoop.ha.ZKFailoverController: Fatal error occurred:Received stat error from Zookeeper. code:CONNECTIONLOSS. Not retrying further znode monitoring connection errors.
2020-08-26 14:30:18,180 INFO org.apache.hadoop.ipc.Server: Stopping server on 8019
2020-08-26 14:30:18,180 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x46dc45ca67201f1
2020-08-26 14:30:18,181 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x46dc45ca67201f1
2020-08-26 14:30:18,181 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x46dc45ca67201f1
2020-08-26 14:30:18,181 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
2020-08-26 14:30:18,181 INFO org.apache.hadoop.ha.HealthMonitor: Stopping HealthMonitor thread
2020-08-26 14:30:18,181 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8019
2020-08-26 14:30:18,181 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-08-26 14:30:18,181 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x46dc45ca67201f1
2020-08-26 14:30:18,181 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
请问这个如何优化或者解决,请知道的同学给点建议和思路。
没有找到相关结果
已邀请:
1 个回复
空心菜 - 心向阳光,茁壮成长
赞同来自: Ansible
1. 首先你可以通过telnet 和 ping等方法测试网络质量,看看ZKFC节点所在主机到Zookeeper的网络质量是否有问题。
2. 优化一下Zookeeper的配置,让超时空间得到缓解
3. hadoop配置文件中建议配置如下属性
hdfs-size.xml:
core-site.xml:
调整了如上参数还不行,你只能在从如下几个方向去思考: