apache-spark - Spark:执行器心跳超时
问题描述
我正在一个具有240GB
内存和 64 个内核的数据块集群中工作。这是我定义的设置。
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import *
import pyspark.sql.functions as fs
from pyspark.sql import SQLContext
from pyspark import SparkContext
from pyspark.sql.functions import count
from pyspark.sql.functions import col, countDistinct
from pyspark import SparkContext
from geospark.utils import GeoSparkKryoRegistrator, KryoSerializer
from geospark.register import upload_jars
from geospark.register import GeoSparkRegistrator
spark.conf.set("spark.sql.shuffle.partitions", 1000)
#Recommended settings for using GeoSpark
spark.conf.set("spark.driver.memory", "20g")
spark.conf.set("spark.network.timeout", "1000s")
spark.conf.set("spark.driver.maxResultSize", "10g")
spark.conf.set("spark.serializer", KryoSerializer.getName)
spark.conf.set("spark.kryo.registrator", GeoSparkKryoRegistrator.getName)
upload_jars()
SparkContext.setSystemProperty("geospark.global.charset","utf8")
spark.conf.set
我正在处理大型数据集,这是我在运行数小时后得到的错误。
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 10.0 failed 4 times, most recent failure: Lost task 3.3 in stage 10.0 (TID 6054, 10.17.21.12, executor 7):
ExecutorLostFailure (executor 7 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 170684 ms
解决方案
您能否尝试以下选项,
- 例如,将您工作的数据框重新分区为更多数字
df.repartition(1000)
--conf spark.network.timeout 10000000
--conf spark.executor.heartbeatInterval=10000000
推荐阅读
- bootstrap-4 - Grid 减少列之间的间隙
- javascript - 我如何使用 Javascript 使用魔法卡打印机打印身份证
- android - WearOS:相同的包名称或相同的applicationId?
- c# - Antlr4 没有检测到从 1 到 8 的整数,词法分析器问题
- php - SQL 从数据库中获取 Woocommerce 产品缩略图
- python - google search html doesn't contain div id='resultStats'
- lua - 我怎么能用lua做到这一点?(php 中的 preg_match())
- extjs - 如何更改 Ext.dataview.List 中的项目?
- reactjs - 为什么我应该/不应该升级到 react-native 0.60.0?
- java - 更新行时 SQLite 中没有这样的列错误