首页 > 解决方案 > EMR 集群在步骤状态“正在运行/待定”中挂起

问题描述

我正在通过带有自定义 jar 步骤的 java SDK 启动 EMR 集群。集群启动成功,但在步骤处于挂起/运行状态的引导后,集群卡住了。我什至无法在机器上 ssh。

以下是我使用自定义 jar 步骤启动集群的代码 -

        String dataTrasnferJar = s3://test/testApplication.jar;
        if (dataTrasnferJar == null || dataTrasnferJar.isEmpty())
            throw new InvalidS3ObjectException(
                    "EMR custom jar file path is null/empty. Please provide a valid jar file path");

        HadoopJarStepConfig customJarConfig = new HadoopJarStepConfig().withJar(dataTrasnferJar);
        StepConfig customJarStep = new StepConfig("Mongo_to_S3_Data_Transfer", customJarConfig)
                .withActionOnFailure(ActionOnFailure.CONTINUE);

        AmazonElasticMapReduce emr = AmazonElasticMapReduceClientBuilder.standard()
                .withCredentials(awsCredentialsProvider)
                .withRegion(region)
                .build();

        Application spark = new Application().withName("Spark");

        String clusterName  = "my-cluster-" + System.currentTimeMillis();
        RunJobFlowRequest request = new RunJobFlowRequest()
                .withName(clusterName)
                .withReleaseLabel("emr-6.0.0")
                .withApplications(spark)
                .withVisibleToAllUsers(true)
                .withSteps(customJarStep)
                .withLogUri(loggingS3Bucket)
                .withServiceRole("EMR_DefaultRole")
                .withJobFlowRole("EMR_EC2_DefaultRole")
                .withInstances(new JobFlowInstancesConfig()
                    .withEc2KeyName(key_pair) 
                    .withInstanceCount(instanceCount)
                    .withEc2SubnetIds(subnetId)
                    .withAdditionalMasterSecurityGroups(securityGroup)
                    .withKeepJobFlowAliveWhenNoSteps(true)    
                    .withMasterInstanceType(instanceType));

        RunJobFlowResult result = emr.runJobFlow(request);  

标签: javaamazon-web-servicesamazon-emr

解决方案


EMR emr-6.0.0 版本仍在开发中。您可以对 emr-5.29.0 进行相同的尝试吗?


推荐阅读