首页 > 解决方案 > PySpark where 子句条件条件

问题描述

我在以下查询中收到语法错误:

df_result = df_checkout.join(df_checkin, 
                                    (
                                    (df_checkout.product == df_checkin.product)
                                    (df_checkout.host == df_checkin.host)
                                    ),
                                    how = 'full_outer').where(df_checkout.rank = 
                                        F.when(((df_checkout.rank = df_checkin.rank) and (F.unix_timestamp(df_checkout.checkout_date, 'MM/dd/YYYY HH:MI:SS') <= F.unix_timestamp(df_checkin.checkin_date, 'MM/dd/YYYY HH:MI:SS'))), (df_checkin.rank - 1)).when(((df_checkout.rank = df_checkin.rank) and (F.unix_timestamp(df_checkout.checkout_date, 'MM/dd/YYYY HH:MI:SS') >= F.unix_timestamp(df_checkin.checkin_date, 'MM/dd/YYYY HH:MI:SS'))), df_checkin.rank).otherwise(None)
                                    )

我有什么错误?

标签: pythonapache-sparkpyspark

解决方案


你有一个=而不是==

(df_checkout.rank = df_checkin.rank)

应该

(df_checkout.rank == df_checkin.rank)

推荐阅读