首页 > 解决方案 > Spark 2.3 Executor 内存泄漏

问题描述

我收到内存泄漏警告,理想情况下这是一个 Spark 错误,直到 1.6 版本并已解决。

模式:独立 IDE:PyCharm Spark 版本:2.3 Python 版本:3.6

下面是堆栈跟踪 -

2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3148
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3152
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3151
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3150
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3149
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3153
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3154
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3158
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3155
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3157
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3160
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3161
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3156
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3159
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3165
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3163
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3162
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3166

关于为什么会发生的任何见解?虽然我的工作成功地完成了。

编辑:许多人说这是 2 年前的问题的重复,但那里的答案说这是一个 Spark 错误,但是当在 Spark 的 Jira 中检查时,它说它已解决。

这里的问题是,这么多版本之后,为什么我在 Spark 2.3 中仍然得到相同的结果?如果对我的查询有一些有效或合乎逻辑的答案,我肯定会删除这个问题。

标签: pythonpython-3.xapache-sparkmemory-leakspyspark

解决方案


根据SPARK-14168,警告源于没有消耗整个迭代器。从 Spark shell 中的 RDD 中获取 n 个元素时,我遇到了同样的错误。


推荐阅读