dataframe - 错误:org.apache.spark.sql.execution.datasources.FileFormatWriter$.write
问题描述
我在以下配置上运行:集群类型:E64_v3(1 个驱动程序 + 3 个工作人员)其他 spark cnfigs:
spark.shuffle.io.connectionTimeout 1200s
spark.databricks.io.cache.maxMetaDataCache 40g
spark.rpc.askTimeout 1200s
spark.databricks.delta.snapshotPartitions 576
spark.databricks.optimizer.rangeJoin.binSize 256
spark.sql.inMemoryColumnarStorage.batchSize 10000
spark.sql.legacy.parquet.datetimeRebaseModeInWrite CORRECTED
spark.executor.cores 16
spark.executor.memory 54g
spark.rpc.lookupTimeout 1200s
spark.driver.maxResultSize 220g
spark.databricks.io.cache.enabled true
spark.rpc.io.backLog 256
spark.sql.shuffle.partitions 576
spark.network.timeout 1200s
spark.sql.inMemoryColumnarStorage.compressed true
spark.databricks.io.cache.maxDiskUsage 220g
spark.storage.blockManagerSlaveTimeoutMs 1200s
spark.executor.instances 12
spark.sql.windowExec.buffer.in.memory.threshold 524288
spark.executor.heartbeatInterval 100s
spark.default.parallelism 576
spark.core.connection.ack.wait.timeout 1200s
这是我的错误堆栈:
---> 41 df.write.format("delta").mode("overwrite").save(path)
/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
825 self._jwrite.save()
826 else:
--> 827 self._jwrite.save(path)
Py4JJavaError: An error occurred while calling o784.save.
: org.apache.spark.SparkException: Job aborted.
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:230)
.
.
.
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 13 (execute at DeltaInvariantCheckerExec.scala:88) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: Failed to connect to /10.179....
知道如何减轻这种情况吗?
解决方案
推荐阅读
- python - scikit-learn 训练出错:无法分配具有形状的数组
- regex - Splunk - 正则表达式从源中提取字段
- c# - 如何从列表中删除第一个重复值?
- tensorflow - object_detection_API 中权重 maskrcnn Tensorflow 2 的不正确冻结
- sql - 在一个引用另一个子查询中的表的子查询中添加 where 条件 (ANSI SQL)
- azure - Azure VM:: 如何以编程方式获取对应于 vm 大小的虚拟机的 vcpus 和 GiB 内存
- powershell - 无法以管理员身份在新安装的 Windows Server 2016 上安装 NuGet?
- mysql - 在本地存储图像并在反应应用程序中引用它们
- go - 在go中按域查找IP地址
- python - 在python中,使用长线而不分成多线会影响速度或性能吗?