首页 > 解决方案 > Flink EMR Deployment can't pick up Yarn context and executes only as a local application

问题描述

I'm deploying my Flink app on AWS EMR 6.2

Flink version: 1.11.2

I configured the step with:

Arguments :"flink-yarn-session -d -n 2"

as described here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/flink-jobs.html

Both the cluster and the step are in state 'running'.

The controller logs shows:

INFO startExec 'hadoop jar /mnt/var/lib/hadoop/steps/s-309XLKSZGN4V4/trigger-service-***.jar flink-yarn-session -d -n 2'

However, the syslog show that the application didn't pick-up on the Yarn context.

2021-02-24 13:19:57,772 INFO org.apache.flink.runtime.minicluster.MiniCluster (main): Starting Flink Mini Cluster

I pulled out the class name returned by StreamExecutionEnvironment.getExecutionEnvironment in my app, and indeed it is LocalStreamEnvironment.

The application itself runs properly on the JobMaster instance as a local app.

标签: apache-flinkamazon-emr

解决方案


事实证明 -n 在新版本的 flink 中已被弃用,因此,根据 AWS 支持的建议,我使用了以下命令:

flink-yarn-session -d -s 2  

参数含义:

-d,--detached 开始分离

-s,--slots 每个 TaskManager 的槽数

此外,为了使 Flink 需要创建两个步骤:

--steps '[{"Args":["flink-yarn-session","-d","-s","2"],"Type":"CUSTOM_JAR","MainClass":"","ActionOnFailure":"CANCEL_AND_WAIT","Jar":"command-runner.jar","Properties":"","Name":"Yarn Long Running Session"},{"Args":["bash","-c","aws s3 cp s3://<URL of Application Jar> .;/usr/bin/flink run -m yarn-cluster -p 2 -d <Application Jar>"],"Type":"CUSTOM_JAR","MainClass":"","ActionOnFailure":"CANCEL_AND_WAIT","Jar":"command-runner.jar","Properties":"","Name":"Flink Job"}]'

第一个使用 command-runner.jar 执行

flink 纱线会话 -d -s 2

第二个使用 command-runner.jar 从 S3 下载我的应用程序 Jar 并在 yarn-cluster 上使用 flink 运行作业。


推荐阅读