apache-spark - 如何使用 AWS ECS Fargate 而不是 EMR/EC2 运行 apache spark 作业?
问题描述
AWS Elastic MapReduce 有很多功能,但它有一些粗糙的边缘,我想回避一些我想在 Apache Spark 中进行的相当便宜的计算。具体来说,我想看看是否可以在 AWS ECS/Fargate 上运行(scala)火花应用程序。如果我能让它只与一个在客户端/本地模式下运行的容器一起工作,那就更好了。
我首先使用 hadoop3(用于 AWS STS 支持)和选择的 kubernetes 配置文件分发 Spark:
# in apache/spark git repository under tag v2.4.0
./dev/make-distribution.sh --name hadoop3-kubernetes -Phadoop-3.1 -Pkubernetes -T4
然后从该发行版中构建一个通用的 spark docker 映像:
docker build -t spark:2.4.0-hadoop3.1 -f kubernetes/dockerfiles/spark/Dockerfile .
然后在我的项目中,我在 tope 上构建了另一个 docker 映像,将我的 sbt-assembled uberjar 复制到工作目录中,并将入口点设置为 spark-submit shell 脚本。
# Dockerfile
FROM spark:2.4.0-hadoop3.1
COPY target/scala-2.11/my-spark-assembly.jar .
ENTRYPOINT [ "/opt/spark/bin/spark-submit" ]
在我的本地机器上,我可以通过在 docker-compose 命令规范中提供适当的参数来运行该应用程序:
# docker-compose.yml
...
command:
- --master
- local[*]
- --deploy-mode
- client
- my-spark-assembly.jar
不幸的是,在 Fargate ECS 中,我通过以下堆栈跟踪写入 CloudWatch 快速失败:
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:714)
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
at org.apache.spark.SparkConf$$anonfun$getOption$1.apply(SparkConf.scala:388)
at org.apache.spark.SparkConf$$anonfun$getOption$1.apply(SparkConf.scala:388)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.SparkConf.getOption(SparkConf.scala:388)
at org.apache.spark.SparkConf.get(SparkConf.scala:250)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
at org.apache.spark.deploy.SparkSubmit$$anonfun$2.apply(SparkSubmit.scala:334)
at org.apache.spark.deploy.SparkSubmit$$anonfun$2.apply(SparkSubmit.scala:334)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: c0d66fa49434: c0d66fa49434: Name does not resolve
at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress$lzycompute(Utils.scala:939)
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress(Utils.scala:939)
at org.apache.spark.util.Utils$$anonfun$localCanonicalHostName$1.apply(Utils.scala:996)
at org.apache.spark.util.Utils$$anonfun$localCanonicalHostName$1.apply(Utils.scala:996)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
at org.apache.spark.internal.config.package$.<init>(package.scala:296)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
... 18 more
Caused by: java.net.UnknownHostException: c0d66fa49434: Name does not resolve
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
... 27 more
有没有人通过类似的尝试取得任何成功?
解决方案
推荐阅读
- sql - 查询返回同一营业年度内的记录
- r - Rmarkdown 中重用的代码块不适用于 magrittr 管道
- powershell - Powershell 在非标准端口上显式从 FTPS 下载文件
- ios - 实现消息 - 在 iOS15 上与您共享功能
- asp.net-core - 在剑道网格中创建另一列,列 1 和 2 的总和
- sql - 将字符串转换为日期,但日期格式不正确
- php - 如何在 php printf 语句中添加下一行空格(/r/n)?
- android - Android Jetpack Compose - 浅色主题更改后底页不会展开
- reporting-services - SSRS,将行返回到网格布局
- google-assistant - Google Actions Builder 和 Dialogflow CX?