首页 > 解决方案 > Spark df.cache() 导致 org.apache.spark.memory.SparkOutOfMemoryError

问题描述

我遇到了这个问题,一切正常,但是当我使用df.cache()它时会导致org.apache.spark.memory.SparkOutOfMemoryError问题。

有没有人遇到过类似的问题?

谢谢!

这是导致问题的代码行:

df.cache()
df = df.select(
    *df.columns,
    f.greatest(*custom_columns).alias(f"custom_max"),
    f.least(*custom_columns).alias(f"custom_min"),
)
return df

这是错误日志:

[2021-04-17 08:54:35,894] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes [Stage 2:=======================================>               (144 + 4) / 200]
[2021-04-17 08:54:54,481] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes [Stage 2:========================================>              (148 + 4) / 200]2021-04-17 08:54:54 WARN  MemoryStore:66 - Failed to reserve initial memory threshold of 1024.0 KB for computing block rdd_173_11 in memory.
[2021-04-17 08:54:54,494] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes 2021-04-17 08:54:54 WARN  MemoryStore:66 - Not enough space to cache rdd_173_11 in memory! (computed 384.0 B so far)
[2021-04-17 08:54:54,495] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes 2021-04-17 08:54:54 WARN  MemoryStore:66 - Failed to reserve initial memory threshold of 1024.0 KB for computing block rdd_173_17 in memory.
[2021-04-17 08:54:54,496] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes 2021-04-17 08:54:54 WARN  MemoryStore:66 - Not enough space to cache rdd_173_17 in memory! (computed 384.0 B so far)
[2021-04-17 08:54:54,505] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes 2021-04-17 08:54:54 WARN  BlockManager:66 - Persisting block rdd_173_11 to disk instead.
====
There are a bunch of WARN like this in between
====
[2021-04-17 08:55:13,178] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes 2021-04-17 08:55:13 WARN  MemoryStore:66 - Not enough space to cache rdd_285_0 in memory! (computed 384.0 B so far)
[2021-04-17 08:55:13,242] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes 2021-04-17 08:55:13 ERROR Executor:91 - Exception in task 4.0 in stage 2.0 (TID 154)
[2021-04-17 08:55:13,242] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 65536 bytes of memory, got 0
[2021-04-17 08:55:13,242] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
[2021-04-17 08:55:13,242] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
[2021-04-17 08:55:13,242] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.<init>(UnsafeInMemorySorter.java:128)
[2021-04-17 08:55:13,242] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:161)
[2021-04-17 08:55:13,242] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:128)
[2021-04-17 08:55:13,243] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.sql.execution.UnsafeExternalRowSorter.<init>(UnsafeExternalRowSorter.java:108)
[2021-04-17 08:55:13,243] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.sql.execution.UnsafeExternalRowSorter.create(UnsafeExternalRowSorter.java:93)
[2021-04-17 08:55:13,243] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.sql.execution.SortExec.createSorter(SortExec.scala:87)
[2021-04-17 08:55:13,243] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102)
[2021-04-17 08:55:13,243] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:101)
[2021-04-17 08:55:13,243] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
[2021-04-17 08:55:13,243] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
[2021-04-17 08:55:13,243] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
[2021-04-17 08:55:13,243] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
[2021-04-17 08:55:13,243] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
[2021-04-17 08:55:13,244] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
[2021-04-17 08:55:13,244] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.scheduler.Task.run(Task.scala:109)
[2021-04-17 08:55:13,244] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
[2021-04-17 08:55:13,244] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2021-04-17 08:55:13,244] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[2021-04-17 08:55:13,244] {base_task_runner.py:115} INFO - Job 19: Subtask compute_outcomes     at java.lang.Thread.run(Thread.java:748)

标签: pythonapache-sparkpyspark

解决方案


df.cache会将您的整个数据帧缓存到执行程序的内存中。所以这个错误意味着你的执行者没有足够的内存。你需要给它更多使用--executor-memory


推荐阅读