首页 > 解决方案 > AWS EMR 上的 sparklyr 未开始运行

问题描述

我在 aws EMR spark 过程中收到此错误:

Attaching package: ‘sparklyr’

The following object is masked from ‘package:stats’:

    filter

Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId,  : 
  Gateway in localhost:8880 did not respond.


Try running `options(sparklyr.log.console = TRUE)` followed by `sc <- spark_connect(...)` for more debugging info.
Calls: spark_connect ... start_shell -> withCallingHandlers -> spark_connect_gateway
Execution halted
Command exiting with ret '1'

我正在运行的代码是:

library(sparklyr)

Sys.setenv(SPARK_HOME="/usr/lib/spark/")
config <- spark_config()
sc <- spark_connect(master = "yarn", config = config, version = '1.6.2')

我也尝试过使用 yarn-client 而不是 yarn changeindg 版本,但我总是遇到同样的错误。这是 EMR 步骤配置:

"Steps": [
        {
            "Name": "Pyramid cee",
            "ActionOnFailure": "TERMINATE_CLUSTER",
            "HadoopJarStep": {
                "Jar": "s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar",
                "Args": [
                    "s3://bucket/emr/runner.sh"
                ]
            }
        }
    ]

另外,这是 runner.sh 的内容

#!/bin/bash
aws s3 cp s3://bucket/emr/sparklyr.R /tmp/sparklyr.R
Rscript /tmp/sparklyr.R

我该如何解决这个错误?

标签: ramazon-web-servicesapache-sparkamazon-emrsparklyr

解决方案


推荐阅读