scala - 如何在本地运行胶水作业?
问题描述
我有这里描述的设置项目。但是代码:
import com.amazonaws.services.glue.{AWSGlueClientBuilder, GlueContext}
import org.apache.spark.SparkContext
import org.slf4j.LoggerFactory
object MyGlueJob {
private val logger = LoggerFactory.getLogger(getClass)
def main(sysArgs: Array[String]) {
val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)
val awsGlueClient = AWSGlueClientBuilder.defaultClient
}
}
失败并出现错误:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/11/21 15:40:32 INFO SparkContext: Running Spark version 2.4.3
19/11/21 15:40:33 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:368)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:117)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2544)
at MyGlueJob$.main(MyGlueJob.scala:13)
at MyGlueJob.main(MyGlueJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:66)
19/11/21 15:40:33 ERROR Utils: Uncaught exception in thread main
java.lang.NullPointerException
at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$postApplicationEnd(SparkContext.scala:2416)
at org.apache.spark.SparkContext$$anonfun$stop$1.apply$mcV$sp(SparkContext.scala:1931)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1930)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:585)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:117)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2544)
at MyGlueJob$.main(MyGlueJob.scala:13)
at MyGlueJob.main(MyGlueJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:66)
19/11/21 15:40:33 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:66)
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:368)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:117)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2544)
at MyGlueJob$.main(MyGlueJob.scala:13)
at MyGlueJob.main(MyGlueJob.scala)
... 5 more
很明显应该设置主 url,但是如何从命令行或系统变量中设置呢?(例如不接触代码)
我也有 [read] 该--master
参数可以解决问题,但将其添加到 args 什么也没做(这里是 Intellij Idea 运行配置):
关键问题是在本地运行胶水作业并能够在不接触代码的情况下在aws中运行它,这可能吗?
解决方案
您可以显式创建 Spark 会话并设置所需的任何参数。但我不能说这最终会在 Glue 中起作用。以下是我用来在本地测试 Spark 作业的本地会话,即使我最终在 Glue 中运行它们。我只测试纯火花代码。
lazy val spark: SparkSession = {
UserGroupInformation.setLoginUser(UserGroupInformation.createRemoteUser("hduser"))
SparkSession
.builder()
.master("local")
.appName("spark unit test")
.getOrCreate()
}
关键问题是在本地运行胶水作业并能够在不接触代码的情况下在aws中运行它,这可能吗?
可以使用开发端点和 Zeppelin 运行任何代码。请参阅aws 文档。
推荐阅读
- javascript - Vue3 使用了 socket.io-client 3.0:无法读取未定义的属性“sid”
- c# - 如何将所有租户和公司的所有活动库存检索到 Acumatica 中的一个自定义屏幕选择器?
- android - 仅在特定片段中滚动时,CoordinatorLayout 隐藏工具栏
- php - 在数组中搜索键并输出值
- python - Discord bot 使语音通道中的每个人都静音不起作用(Python)
- php - Ajax 成功响应不显示来自 PHP 代码的回显消息
- reactjs - 按下按钮时,setState 仅在第二次按下时起作用
- python - Python 异常是谁的责任?
- python - 是否可以在 Python 中使用“或”来初始化具有 2 个字符串的变量?
- spring - Keycloak/OpenID:代表客户端应用程序请求用户信息