ambari - 卡住 HDP 添加主机
问题描述
我有一个有 2 个节点的 HDP 集群,我们遇到了一些问题,并且由于机器故障而丢失了 1 个主机心跳,没有任何机会恢复它,所以我们最终重新安装了 Ubuntu 并再次配置它。
无法在 ambari 中恢复主机(尝试提供相同的 FQDN、IP、配置……)所以我尝试更改主机名并将其添加为全新的主机。
我确实能够以“成功”状态完成安装步骤 2,但它被以下消息“请稍候,正在检查主机是否存在潜在问题......”卡住了几个小时。
我附上 ambari-server 日志、ambari-agent 日志 ambari 注册日志和错误图像。
您对正在发生的事情以及如何解决它有一些想法吗?
谢谢。
12 jun 2018 09:34:55,667 WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
12 jun 2018 09:34:56,675 WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
12 jun 2018 09:34:57,683 WARN [ambari-action-scheduler] ExecutionCommandWrapper:185 - Unable to lookup the cluster by ID; assuming that there is no cluster and therefore no configs for this execution command: Cluster not found, clusterName=clusterID=-1
INFO 2018-06-12 09:00:16,026 Controller.py:512 - Registration response from bigdata was OK
INFO 2018-06-12 09:00:16,026 Controller.py:517 - Resetting ActionQueue...
INFO 2018-06-12 09:00:26,035 Controller.py:304 - Heartbeat (response id = 0) with server is running...
INFO 2018-06-12 09:00:26,036 Controller.py:311 - Building heartbeat message
INFO 2018-06-12 09:00:26,037 Heartbeat.py:90 - Adding host info/state to heartbeat message.
INFO 2018-06-12 09:00:26,099 logger.py:75 - Testing the JVM's JCE policy to see it if supports an unlimited key length.
INFO 2018-06-12 09:00:26,168 Hardware.py:176 - Some mount points were ignored: /dev, /run, /, /dev/shm, /run/lock, /sys/fs/cgroup, /boot, /run/user/1000, /run/user/0, /run/user/994
INFO 2018-06-12 09:00:26,169 Controller.py:320 - Sending Heartbeat (id = 0)
INFO 2018-06-12 09:00:26,174 Controller.py:332 - Heartbeat response received (id = 1)
INFO 2018-06-12 09:00:26,174 Controller.py:341 - Heartbeat interval is 10 seconds
INFO 2018-06-12 09:00:26,174 Controller.py:377 - Updating configurations from heartbeat
INFO 2018-06-12 09:00:26,174 Controller.py:386 - Adding cancel/execution commands
INFO 2018-06-12 09:00:26,174 Controller.py:403 - Adding recovery commands
INFO 2018-06-12 09:00:26,174 Controller.py:471 - Waiting 9.9 for next heartbeat
INFO 2018-06-12 09:00:36,075 Controller.py:478 - Wait for next heartbeat over
INFO 2018-06-12 09:34:38,350 Controller.py:512 - Registration response from bigdata was OK
INFO 2018-06-12 09:34:38,350 Controller.py:517 - Resetting ActionQueue...
', None)
Connection to master.es closed.
SSH command execution finished
host=master.es, exitcode=0
Command end time 2018-06-12 09:34:38
Registering with the server...
Registering with the server...
解决方案
在所有节点上执行 ambari-agent 重置。
更改集群名称。
推荐阅读
- regex - 根据 Postgres 中列中的特定模式连接列值
- javascript - 使用 JEST 测试使用 fs 读取的文件
- c# - 如何动态更改文本颜色取决于使用 xaml for UWP 应用程序的值
- laravel-nova - Laravel Nova 工具 - 向 Vue 发送元数据
- java - 有条件地将项目添加到 HashMap 的有效方法
- dart - Dart - 从列表中获取最近(更大)的值?
- postgresql - Postgresql 启动错误:“无法刷新脏数据:输入/输出错误”
- javascript - 显示国家/地区城市名称并在两个字段中仅显示国家名称
- vue.js - 如何在带有 vue-json-schema-form 的 json-schema 中使用 $ref?
- flutter - Flutter 这个函数的返回类型是 'Future
',但不以 return 语句结尾