首页 > 解决方案 > Spark Structured Streaming 在创建应用程序后立即停止

问题描述

我有 3 名工作人员在运行在 3 个来宾 VM 上的 Spark UI 上(1 个主节点和 2 个从节点)。我正在尝试运行 Twitter 流媒体应用程序,但出现以下错误。我可以看到有关类似问题的其他线程,但我不明白该错误。我看到一些线程提到内存问题。所以我在运行 spark submit 时检查了我的主机 CPU 和 RAM 是否正常。虽然 RAM 看起来还不错,但 CPU 却高达 100%。这可能是问题吗?任何有助于理解集群启动后主节点和工作节点之间关系的贡献都将不胜感激。

主输出:

Spark Command: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host pd --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/05 00:11:05 INFO Master: Started daemon with process name: 10852@pd
21/03/05 00:11:06 INFO SignalUtils: Registering signal handler for TERM
21/03/05 00:11:06 INFO SignalUtils: Registering signal handler for HUP
21/03/05 00:11:06 INFO SignalUtils: Registering signal handler for INT
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/03/05 00:11:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/05 00:11:13 INFO SecurityManager: Changing view acls to: *****
21/03/05 00:11:13 INFO SecurityManager: Changing modify acls to: *****
21/03/05 00:11:13 INFO SecurityManager: Changing view acls groups to: 
21/03/05 00:11:13 INFO SecurityManager: Changing modify acls groups to: 
21/03/05 00:11:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(*****); groups with view permissions: Set(); users  with modify permissions: Set(*****); groups with modify permissions: Set()
21/03/05 00:11:17 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
21/03/05 00:11:18 INFO Master: Starting Spark master at spark://pd:7077
21/03/05 00:11:18 INFO Master: Running Spark version 3.1.1
21/03/05 00:11:19 INFO Utils: Successfully started service 'MasterUI' on port 8080.
21/03/05 00:11:20 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://pd:8080
21/03/05 00:11:21 INFO Master: Registering worker 10.0.2.15:34975 with 1 cores, 1024.0 MiB RAM
21/03/05 00:11:21 INFO Master: I have been elected leader! New state: ALIVE
21/03/05 00:11:25 INFO Master: Registering worker ***.***.**.104:43757 with 1 cores, 1024.0 MiB RAM
21/03/05 00:11:28 INFO Master: Registering worker 10.0.2.15:34975 with 1 cores, 1024.0 MiB RAM
21/03/05 00:11:31 INFO Master: Registering worker ***.***.**.103:38529 with 1 cores, 1024.0 MiB RAM
21/03/05 00:19:02 INFO Master: Registering app TwitterSentimentAnalysis
21/03/05 00:19:02 INFO Master: Registered app TwitterSentimentAnalysis with ID app-20210305001902-0000
21/03/05 00:19:02 INFO Master: Launching executor app-20210305001902-0000/0 on worker worker-20210305001307-***.***.**.103-38529
21/03/05 00:19:03 INFO Master: Launching executor app-20210305001902-0000/1 on worker worker-20210305001123-***.***.**.104-43757
21/03/05 00:19:03 INFO Master: Launching executor app-20210305001902-0000/2 on worker worker-20210305001308-10.0.2.15-34975
21/03/05 00:19:40 INFO Master: Received unregister request from application app-20210305001902-0000
21/03/05 00:19:40 INFO Master: Removing app app-20210305001902-0000
21/03/05 00:19:40 WARN Master: Got status update for unknown executor app-20210305001902-0000/0
21/03/05 00:19:40 WARN Master: Got status update for unknown executor app-20210305001902-0000/2
21/03/05 00:19:41 WARN Master: Got status update for unknown executor app-20210305001902-0000/1
21/03/05 00:19:41 INFO Master: ***.***.**.104:60360 got disassociated, removing it.
21/03/05 00:19:41 INFO Master: pd:40465 got disassociated, removing it.
21/03/05 00:47:39 INFO Master: Registering app TwitterSentimentAnalysis
21/03/05 00:47:39 INFO Master: Registered app TwitterSentimentAnalysis with ID app-20210305004739-0001
21/03/05 00:47:39 INFO Master: Launching executor app-20210305004739-0001/0 on worker worker-20210305001307-***.***.**.103-38529
21/03/05 00:47:39 INFO Master: Launching executor app-20210305004739-0001/1 on worker worker-20210305001123-***.***.**.104-43757
21/03/05 00:47:39 INFO Master: Launching executor app-20210305004739-0001/2 on worker worker-20210305001308-10.0.2.15-34975
21/03/05 00:48:05 INFO Master: Received unregister request from application app-20210305004739-0001
21/03/05 00:48:05 INFO Master: Removing app app-20210305004739-0001
21/03/05 00:48:05 WARN Master: Got status update for unknown executor app-20210305004739-0001/2
21/03/05 00:48:05 WARN Master: Got status update for unknown executor app-20210305004739-0001/0
21/03/05 00:48:06 WARN Master: Got status update for unknown executor app-20210305004739-0001/1
21/03/05 00:48:06 INFO Master: ***.***.**.104:60388 got disassociated, removing it.
21/03/05 00:48:06 INFO Master: pd:34007 got disassociated, removing it.
21/03/05 00:48:52 INFO Master: Registering app TwitterSentimentAnalysis
21/03/05 00:48:52 INFO Master: Registered app TwitterSentimentAnalysis with ID app-20210305004852-0002
21/03/05 00:48:52 INFO Master: Launching executor app-20210305004852-0002/0 on worker worker-20210305001307-***.***.**.103-38529
21/03/05 00:48:52 INFO Master: Launching executor app-20210305004852-0002/1 on worker worker-20210305001123-***.***.**.104-43757
21/03/05 00:48:52 INFO Master: Launching executor app-20210305004852-0002/2 on worker worker-20210305001308-10.0.2.15-34975
21/03/05 00:49:19 INFO Master: Received unregister request from application app-20210305004852-0002
21/03/05 00:49:19 INFO Master: Removing app app-20210305004852-0002
21/03/05 00:49:19 WARN Master: Got status update for unknown executor app-20210305004852-0002/0
21/03/05 00:49:19 WARN Master: Got status update for unknown executor app-20210305004852-0002/2
21/03/05 00:49:19 WARN Master: Got status update for unknown executor app-20210305004852-0002/1
21/03/05 00:49:19 INFO Master: ***.***.**.104:60402 got disassociated, removing it.
21/03/05 00:49:19 INFO Master: pd:42517 got disassociated, removing it.

工人产出'

Spark Command: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://pd:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/05 00:11:15 INFO Worker: Started daemon with process name: 10993@pd
21/03/05 00:11:15 INFO SignalUtils: Registering signal handler for TERM
21/03/05 00:11:15 INFO SignalUtils: Registering signal handler for HUP
21/03/05 00:11:15 INFO SignalUtils: Registering signal handler for INT
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/03/05 00:11:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/05 00:11:20 INFO SecurityManager: Changing view acls to: ****
21/03/05 00:11:20 INFO SecurityManager: Changing modify acls to: ****
21/03/05 00:11:20 INFO SecurityManager: Changing view acls groups to: 
21/03/05 00:11:20 INFO SecurityManager: Changing modify acls groups to: 
21/03/05 00:11:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(*******); groups with view permissions: Set(); users  with modify permissions: Set(*****); groups with modify permissions: Set()
21/03/05 00:11:23 INFO Utils: Successfully started service 'sparkWorker' on port 43757.
21/03/05 00:11:23 INFO Worker: Worker decommissioning not enabled, SIGPWR will result in exiting.
21/03/05 00:11:24 INFO Worker: Starting Spark worker ***.***.**.104:43757 with 1 cores, 1024.0 MiB RAM
21/03/05 00:11:24 INFO Worker: Running Spark version 3.1.1
21/03/05 00:11:24 INFO Worker: Spark home: /opt/spark
21/03/05 00:11:24 INFO ResourceUtils: ==============================================================
21/03/05 00:11:24 INFO ResourceUtils: No custom resources configured for spark.worker.
21/03/05 00:11:24 INFO ResourceUtils: ==============================================================
21/03/05 00:11:24 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
21/03/05 00:11:24 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://pd:8081
21/03/05 00:11:24 INFO Worker: Connecting to master pd:7077...
21/03/05 00:11:24 INFO TransportClientFactory: Successfully created connection to pd/***.***.**.104:7077 after 98 ms (0 ms spent in bootstraps)
21/03/05 00:11:25 INFO Worker: Successfully registered with master spark://pd:7077
21/03/05 00:19:03 INFO Worker: Asked to launch executor app-20210305001902-0000/1 for TwitterSentimentAnalysis
21/03/05 00:19:03 INFO SecurityManager: Changing view acls to: *****
21/03/05 00:19:03 INFO SecurityManager: Changing modify acls to: *****
21/03/05 00:19:03 INFO SecurityManager: Changing view acls groups to: 
21/03/05 00:19:03 INFO SecurityManager: Changing modify acls groups to: 
21/03/05 00:19:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(*****); groups with view permissions: Set(); users  with modify permissions: Set(*****); groups with modify permissions: Set()
21/03/05 00:19:04 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-11-openjdk-amd64/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=40465" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@pd:40465" "--executor-id" "1" "--hostname" "***.***.**.104" "--cores" "1" "--app-id" "app-20210305001902-0000" "--worker-url" "spark://Worker@***.***.**.104:43757"
21/03/05 00:19:41 INFO Worker: Executor app-20210305001902-0000/1 finished with state EXITED message Command exited with code 0 exitStatus 0
21/03/05 00:19:41 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 1
21/03/05 00:19:41 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20210305001902-0000, execId=1)
21/03/05 00:19:41 INFO Worker: Asked to kill unknown executor app-20210305001902-0000/1
21/03/05 00:19:41 INFO ExternalShuffleBlockResolver: Application app-20210305001902-0000 removed, cleanupLocalDirs = true
21/03/05 00:19:41 INFO Worker: Cleaning up local directories for application app-20210305001902-0000
21/03/05 00:47:39 INFO Worker: Asked to launch executor app-20210305004739-0001/1 for TwitterSentimentAnalysis
21/03/05 00:47:39 INFO SecurityManager: Changing view acls to: *****
21/03/05 00:47:39 INFO SecurityManager: Changing modify acls to: ******
21/03/05 00:47:39 INFO SecurityManager: Changing view acls groups to: 
21/03/05 00:47:39 INFO SecurityManager: Changing modify acls groups to: 
21/03/05 00:47:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(********); groups with view permissions: Set(); users  with modify permissions: Set(delalma); groups with modify permissions: Set()
21/03/05 00:47:41 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-11-openjdk-amd64/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=34007" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@pd:34007" "--executor-id" "1" "--hostname" "***.***.**.104" "--cores" "1" "--app-id" "app-20210305004739-0001" "--worker-url" "spark://Worker@***.***.**.104:43757"
21/03/05 00:48:05 INFO Worker: Asked to kill executor app-20210305004739-0001/1
21/03/05 00:48:05 INFO ExecutorRunner: Runner thread for executor app-20210305004739-0001/1 interrupted
21/03/05 00:48:05 INFO ExecutorRunner: Killing process!
21/03/05 00:48:06 INFO Worker: Executor app-20210305004739-0001/1 finished with state KILLED exitStatus 0
21/03/05 00:48:06 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 1
21/03/05 00:48:06 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20210305004739-0001, execId=1)
21/03/05 00:48:06 INFO Worker: Cleaning up local directories for application app-20210305004739-0001
21/03/05 00:48:06 INFO ExternalShuffleBlockResolver: Application app-20210305004739-0001 removed, cleanupLocalDirs = true
21/03/05 00:48:52 INFO Worker: Asked to launch executor app-20210305004852-0002/1 for TwitterSentimentAnalysis
21/03/05 00:48:52 INFO SecurityManager: Changing view acls to: *******
21/03/05 00:48:52 INFO SecurityManager: Changing modify acls to: ******
21/03/05 00:48:52 INFO SecurityManager: Changing view acls groups to: 
21/03/05 00:48:52 INFO SecurityManager: Changing modify acls groups to: 
21/03/05 00:48:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(*******); groups with view permissions: Set(); users  with modify permissions: Set(delalma); groups with modify permissions: Set()
21/03/05 00:48:54 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-11-openjdk-amd64/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=42517" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@pd:42517" "--executor-id" "1" "--hostname" ***.***.**.104" "--cores" "1" "--app-id" "app-20210305004852-0002" "--worker-url" "spark://Worker@***.***.**.104:43757"
21/03/05 00:49:19 INFO Worker: Asked to kill executor app-20210305004852-0002/1
21/03/05 00:49:19 INFO ExecutorRunner: Runner thread for executor app-20210305004852-0002/1 interrupted
21/03/05 00:49:19 INFO ExecutorRunner: Killing process!
21/03/05 00:49:19 INFO Worker: Executor app-20210305004852-0002/1 finished with state KILLED exitStatus 143
21/03/05 00:49:19 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 1
21/03/05 00:49:19 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20210305004852-0002, execId=1)
21/03/05 00:49:19 INFO Worker: Cleaning up local directories for application app-20210305004852-0002
21/03/05 00:49:19 INFO ExternalShuffleBlockResolver: Application app-20210305004852-0002 removed, cleanupLocalDirs = true

来自 UI 的主节点日志上的工作人员:

Spark Executor Command: "/usr/lib/jvm/java-11-openjdk-amd64/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=42517" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@pd:42517" "--executor-id" "1" "--hostname" "***.***.**.104" "--cores" "1" "--app-id" "app-20210305004852-0002" "--worker-url" "spark://Worker@***.***.**.104:43757"
========================================

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/05 00:49:02 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 11595@pd
21/03/05 00:49:02 INFO SignalUtils: Registering signal handler for TERM
21/03/05 00:49:02 INFO SignalUtils: Registering signal handler for HUP
21/03/05 00:49:02 INFO SignalUtils: Registering signal handler for INT
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/03/05 00:49:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/05 00:49:07 INFO SecurityManager: Changing view acls to: ******
21/03/05 00:49:07 INFO SecurityManager: Changing modify acls to: ******
21/03/05 00:49:07 INFO SecurityManager: Changing view acls groups to: 
21/03/05 00:49:07 INFO SecurityManager: Changing modify acls groups to: 
21/03/05 00:49:07 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(*****); groups with view permissions: Set(); users  with modify permissions: Set(****); groups with modify permissions: Set()
21/03/05 00:49:08 INFO TransportClientFactory: Successfully created connection to pd/***.***.**.104:42517 after 336 ms (0 ms spent in bootstraps)
21/03/05 00:49:09 INFO SecurityManager: Changing view acls to: ******
21/03/05 00:49:09 INFO SecurityManager: Changing modify acls to: *****
21/03/05 00:49:09 INFO SecurityManager: Changing view acls groups to: 
21/03/05 00:49:09 INFO SecurityManager: Changing modify acls groups to: 
21/03/05 00:49:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(*****); groups with view permissions: Set(); users  with modify permissions: Set(********); groups with modify permissions: Set()
21/03/05 00:49:10 INFO TransportClientFactory: Successfully created connection to pd/***.***.**.104:42517 after 26 ms (0 ms spent in bootstraps)
21/03/05 00:49:10 INFO DiskBlockManager: Created local directory at /tmp/spark-64626d64-5b29-4a13-8ee2-cc98d61f7a2c/executor-db784ed0-ccba-4e37-b97b-325221d118e0/blockmgr-3da06abd-90af-4a44-a423-c4b5253427df
21/03/05 00:49:10 INFO MemoryStore: MemoryStore started with capacity 413.9 MiB
21/03/05 00:49:12 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@pd:42517
21/03/05 00:49:12 INFO WorkerWatcher: Connecting to worker spark://Worker@***.***.**.104:43757
21/03/05 00:49:12 INFO TransportClientFactory: Successfully created connection to /***.***.**.104:43757 after 27 ms (0 ms spent in bootstraps)
21/03/05 00:49:12 INFO WorkerWatcher: Successfully connected to spark://Worker@***.***.**.104:43757
21/03/05 00:49:12 INFO ResourceUtils: ==============================================================
21/03/05 00:49:12 INFO ResourceUtils: No custom resources configured for spark.executor.
21/03/05 00:49:12 INFO ResourceUtils: ==============================================================
21/03/05 00:49:12 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
21/03/05 00:49:12 INFO Executor: Starting executor ID 1 on host ***.***.**.104
21/03/05 00:49:13 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41941.
21/03/05 00:49:13 INFO NettyBlockTransferService: Server created on ***.***.**.104:41941
21/03/05 00:49:13 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/03/05 00:49:13 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(1, ***.***.**.104, 41941, None)
21/03/05 00:49:13 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, ***.***.**.104, 41941, None)
21/03/05 00:49:13 INFO BlockManager: Initialized BlockManager: BlockManagerId(1, ***.***.**.104, 41941, None)
21/03/05 00:49:19 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
21/03/05 00:49:19 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
tdown

标签: apache-sparkspark-structured-streaming

解决方案


推荐阅读