apache-spark - Spark EMR 作业失败:原因:org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 bytes of memory, got 0
问题描述
我正在使用 Spark 2.4.6 和 EMR 5.31.0 运行具有 12 个执行程序、m5.4xlarge 实例类型、51GB 堆的 Spark Scala EMR 作业。在我的 EMR 集群中,我还具有以下配置:
[{"classification":"spark", "properties":{"maximizeResourceAllocation":"true"}, "configurations":[]}]
我不明白我的 OOM 错误来自哪里以及我能做些什么来解决它。在我写之前,我做了一个join
, groupBy
,groupBy
和另一个join
. 如何找出导致 OOM 错误的原因?
DF.withColumn("hour", hour(col("start")))
.withColumn("day", dayofmonth(col("start")))
.withColumn("month", month(col("start")))
.withColumn("year", year(col("start")))
.write
.partitionBy("year", "month", "day", "hour")
.mode(SaveMode.Overwrite)
.parquet(outputPath)
Caused by: org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 bytes of memory, got 0
at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.<init>(UnsafeInMemorySorter.java:128)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:161)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:128)
at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray.add(ExternalAppendOnlyUnsafeRowArray.scala:115)
at org.apache.spark.sql.execution.window.WindowExec$$anonfun$11$$anon$1.fetchNextPartition(WindowExec.scala:343)
at org.apache.spark.sql.execution.window.WindowExec$$anonfun$11$$anon$1.next(WindowExec.scala:369)
at org.apache.spark.sql.execution.window.WindowExec$$anonfun$11$$anon$1.next(WindowExec.scala:303)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:585)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:188)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
DAG Visulization of Failed Stage Aggregated Metrics By Executor of Failed Stage Executors 选项卡
解决方案
推荐阅读
- php - PHP 致命错误:允许的内存大小为 1073741824 字节已用尽(试图分配 16777216 字节)
- java - Java android EventBus 得到两个相同的事件
- intellij-idea - 在 Intellij 中传递“-parameters”选项
- javascript - 使用 Lodash 在 Nodejs 中按多个字段分组
- scala - 如何将一个集合添加到 Scala 中的集合列表中
- postgresql - Postgres 中的简单存储过程
- php - 如何从 PHP 中的 JSON 数组中获取常用值?
- jquery - 使用正则表达式验证 2dp 十进制值
- javascript - 无法将函数绑定到 ajax 请求
- android - BLE Gatt onConnectionStateChanged 在 Android 中失败,状态为 257