首页 > 解决方案 > Pyspark 基于常量值过滤行

问题描述

+------------+---------+----------+-----------+
|     part_no|prod_week| daily_qty|lineoffdate|
+------------+---------+----------+-----------+
|019990616100|   202004| 000000000| 2020-01-23|
|019990616100|   202004| 000000000| 2020-01-24|
|019990616100|   202004| 000000000| 2020-01-25|
|019990616100|   202005| 000000000| 2020-01-26|
|019990616100|   202005| 000000000| 2020-01-27|
|019990616100|   202005| 000000001| 2020-01-28|
|019990616100|   202005| 000000000| 2020-01-29|
|019990616100|   202005| 000000000| 2020-01-30|
|019990616100|   202005| 000000000| 2020-01-31|
|019990616100|   202005| 000000000| 2020-02-01|
|019990616100|   202006| 000000000| 2020-02-02|
|019990616100|   202006| 000000000| 2020-02-03|
|019990616100|   202006| 000000000| 2020-02-04|
|019990616100|   202006| 000000000| 2020-02-05|
|019990616100|   202006| 000000000| 2020-02-06|
|019990616100|   202006| 000000000| 2020-02-07|
|019990616100|   202006| 000000000| 2020-02-08|
|019990616100|   202007| 000000000| 2020-02-09|
|019990616100|   202007| 000000000| 2020-02-10|
|019990616100|   202007| 000000000| 2020-02-11|
+------------+---------+----------+-----------+

我想删除或过滤行以排除值为“000000000”的daily_qty。daily_qty 是字符串类型。尝试了以下组合,但过滤似乎根本不起作用。有人能帮我解决我哪里出错了吗

ds1 =  ds.filter(F.col('daily_qty') != '000000000')
#ds1 =  ds.filter(F.col('daily_qty') != F.lit('000000000'))
#ds1=ds.filter(~F.col('daily_qty').isin(['000000000']))

谢谢, 阿鲁娜

标签: dataframefilterpysparkaws-glue

解决方案


推荐阅读