首页 > 解决方案 > Spark 作业未结束:显示数据框

问题描述

我必须将 5 个数据帧合并到一个数据帧中。数据框看起来像,

+-------------------+---------------------------------------------------------------------------+
|Timestamp          |sentence                                                                   |
+-------------------+---------------------------------------------------------------------------+
|2020-03-13 12:01:32| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field1 with beats|
+-------------------+---------------------------------------------------------------------------+
+-------------------+------------------------------------------------------------------------+
|Timestamp          |sentence                                                                |
+-------------------+------------------------------------------------------------------------+
|2020-03-04 23:10:59| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field2 with kobo |
+-------------------+------------------------------------------------------------------------+
+-------------------+------------------------------------------------------------------------+
|Timestamp          |sentence                                                                |
+-------------------+------------------------------------------------------------------------+
|2020-03-13 12:01:32| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c updated field3 with beats|
+-------------------+------------------------------------------------------------------------+

+-------------------+-------------------------------------------------------------------+
|Timestamp          |sentence                                                           |
+-------------------+-------------------------------------------------------------------+
|2020-02-20 07:20:29| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c added an field4 with beats|
+-------------------+-------------------------------------------------------------------+

+-------------------+---------------------------------------------------------------+
|Timestamp          |sentence                                                       |
+-------------------+---------------------------------------------------------------+
|2020-02-20 07:20:29| : 0792b8d1-7ad9-43fc-9e75-9b1f2612834c added a field5 with beats|
+-------------------+---------------------------------------------------------------+

当联合应用于前 3 个数据帧时,Show 工作正常,但在包括最后两个时,火花作业没有进行。

做我使用的工会,

dfs = [df1, df2, df3, df4, df5]
df_final = reduce(lambda a, b: a.union(b), dfs)
df_final.show()

我想显示结果,但工作卡在showString at NativeMethodAccessorImpl.java:0

我该如何解决这个问题?

标签: pythonapache-sparkpysparkapache-spark-sqlpyspark-dataframes

解决方案


对我来说看起来不错,因为您具有相同的数据类型union 以及相同的列名unionByName

我认为这不是问题,union或者unionByName可能存在其他问题。从调度程序的角度来看,可能是资源紧缩。查看任何其他作业正在并行运行。


推荐阅读