首页 > 解决方案 > spark.sql() 中不等于什么

问题描述

我正在练习使用spark.sql()pyspark 的函数。当我在 spark 中使用 not equal 函数时,我似乎无法使用它<> != Not来执行复杂的查询。

示例查询:

+--------------------+-------------+--------------------+
|               Party|       Handle|               Tweet|
+--------------------+-------------+--------------------+
|            Democrat|RepDarrenSoto|Today, Senate Dem...|
|            Democrat|RepDarrenSoto|RT @WinterHavenSu...|
|            Democrat|RepDarrenSoto|RT @NBCLatino: .@...|
|Congress has allo...|         null|                null|
|            Democrat|RepDarrenSoto|RT @NALCABPolicy:...|
|            Democrat|RepDarrenSoto|RT @Vegalteno: Hu...|
|            Democrat|RepDarrenSoto|RT @EmgageActionF...|
|            Democrat|RepDarrenSoto|Hurricane Maria l...|
|            Democrat|RepDarrenSoto|RT @Tharryry: I a...|
|            Democrat|RepDarrenSoto|RT @HispanicCaucu...|
|            Democrat|RepDarrenSoto|RT @RepStephMurph...|
|            Democrat|RepDarrenSoto|RT @AllSaints_FL:...|
|            Democrat|RepDarrenSoto|.@realDonaldTrump...|
|            Democrat|RepDarrenSoto|Thank you to my m...|
|            Democrat|RepDarrenSoto|We paid our respe...|
|Sgt Sam Howard - ...|         null|                null|
|            Democrat|RepDarrenSoto|RT @WinterHavenSu...|
|            Democrat|RepDarrenSoto|Meet 12 incredibl...|
|            Democrat|RepDarrenSoto|RT @wildlifeactio...|
|            Democrat|RepDarrenSoto|RT @CHeathWFTV: K...|
+--------------------+-------------+--------------------+

spark.sql("""select Party from tweets_tempview where Party <>'Democrat' or 'Republican' """).show(20,False)

错误信息:

"cannot resolve '((NOT (tweets_tempview.`Party` = 'Democrat')) OR 'Republican')' due to data type mismatch: differing types in '((NOT (tweets_tempview.`Party` = 'Democrat')) OR 'Republican')' (boolean and string).; line 1 pos 40;\n'Project ['Party]\n+- 'Filter (NOT (Party#98 = Democrat) || Republican)\n   +- SubqueryAlias `tweets_tempview`\n      +- Relation[Party#98,Handle#99,Tweet#100] csv\n"

什么是让 where 子句值都起作用的 spark sql 函数?

标签: pysparkapache-spark-sql

解决方案


您无法使用单个<>操作比较两个字符串。要么使用:

where Party <> 'Democrat' and Party <> 'Republican'

或者按照评论中的建议使用它

where Party not in ('Democrat', 'Republican')

推荐阅读