首页 > 解决方案 > Flink 1.10.0 - id xxxx 的 ResourceManager 的心跳超时

问题描述

我在 kubernetes 中运行 flink 独立集群 HA。相同的设置在使用 Flink 1.9 时运行完美,但在使用 Flink 1.10 时持续低于错误。

INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor  - The heartbeat of ResourceManager with id 783439e4ead380c60498e32a8e1c0ce3 timed out.
DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor  - Close ResourceManager connection 783439e4ead380c60498e32a8e1c0ce3.
org.apache.flink.runtime.taskexecutor.exceptions.TaskManagerException: The heartbeat of ResourceManager with id 783439e4ead380c60498e32a8e1c0ce3 timed out.
        at org.apache.flink.runtime.taskexecutor.TaskExecutor$ResourceManagerHeartbeatListener.notifyHeartbeatTimeout(TaskExecutor.java:1842)
        at org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(HeartbeatMonitorImpl.java:109)

flink-conf.yaml :

jobmanager.rpc.address: xx.xxx.xx.xxx
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1500m
taskmanager.memory.process.size: 4000m
taskmanager.numberOfTaskSlots: 1
parallelism.default: 1
jobmanager.execution.failover-strategy: region
state.backend: filesystem
state.checkpoints.dir: file:///checkpoints
state.savepoints.dir: file:///savepoints
high-availability: zookeeper
high-availability.jobmanager.port: 50010
high-availability.zookeeper.quorum: xx.xx.xx.xx:xxxx
high-availability.zookeeper.path.root: /flink
high-availability.cluster-id: /ABCD
high-availability.storageDir: file:///recovery
heartbeat.interval: 60000
heartbeat.timeout: 60000
taskmanager.debug.memory.log: true
taskmanager.debug.memory.log-interval: 10000
taskmanager.memory.managed.fraction: 0.1
blob.server.port: 6124
query.server.port: 6125

标签: apache-flinkflink-streaming

解决方案


推荐阅读