首页 > 解决方案 > 错误:org.apache.spark.sql.execution.datasources.FileFormatWriter$.write

问题描述

我在以下配置上运行:集群类型:E64_v3(1 个驱动程序 + 3 个工作人员)其他 spark cnfigs:

spark.shuffle.io.connectionTimeout 1200s 
spark.databricks.io.cache.maxMetaDataCache 40g 
spark.rpc.askTimeout 1200s 
spark.databricks.delta.snapshotPartitions 576 
spark.databricks.optimizer.rangeJoin.binSize 256 
spark.sql.inMemoryColumnarStorage.batchSize 10000 
spark.sql.legacy.parquet.datetimeRebaseModeInWrite CORRECTED 
spark.executor.cores 16 
spark.executor.memory 54g 
spark.rpc.lookupTimeout 1200s 
spark.driver.maxResultSize 220g 
spark.databricks.io.cache.enabled true 
spark.rpc.io.backLog 256 
spark.sql.shuffle.partitions 576 
spark.network.timeout 1200s 
spark.sql.inMemoryColumnarStorage.compressed true 
spark.databricks.io.cache.maxDiskUsage 220g 
spark.storage.blockManagerSlaveTimeoutMs 1200s 
spark.executor.instances 12 
spark.sql.windowExec.buffer.in.memory.threshold 524288 
spark.executor.heartbeatInterval 100s 
spark.default.parallelism 576 
spark.core.connection.ack.wait.timeout 1200s

这是我的错误堆栈:

---> 41     df.write.format("delta").mode("overwrite").save(path) 
/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
825             self._jwrite.save()
826         else:
--> 827             self._jwrite.save(path)

Py4JJavaError: An error occurred while calling o784.save.
: org.apache.spark.SparkException: Job aborted.
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:230)
.
.
.
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 13 (execute at DeltaInvariantCheckerExec.scala:88) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: Failed to connect to /10.179....

知道如何减轻这种情况吗?

标签: dataframepysparkdatabricksdelta-lake

解决方案


推荐阅读