首页 > 解决方案 > Spark SQL 传递变量进行查询

问题描述

我已经到处寻找答案并尝试了一切。似乎没有任何效果。我试图在 python 的 spark.sql 查询中引用变量赋值。运行 python 3 和 spark 版本 2.3.1。

bkt = 1

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
                FROM pwrcrv_tmp\
                where EXTR_CURR_NUM_CYC_DLQ=$bkt\
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
                group by ((year(fdr_date))*100)+month(fdr_date)\
                order by ((year(fdr_date))*100)+month(fdr_date)")

prime.show(50)

错误:

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts                FROM pwrcrv_tmp         where EXTR_CURR_NUM_CYC_DLQ=$bkt                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')                group by ((year(fdr_date))*100)+month(fdr_date)                order by ((year(fdr_date))*100)+month(fdr_date)")
                                                                                                                                                                                                                                                                                                                                                                                         ^
SyntaxError: invalid syntax

标签: pythonpyspark

解决方案


我在这个 databricks 帖子中找到了正确的语法。

https://forums.databricks.com/questions/115/how-do-i-pass-parameters-to-my-sql-statements.html

您在查询前面添加一个小写 f 并在查询中的变量名称周围加上大括号。

bkt = 1

prime = spark.sql(f"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
            FROM pwrcrv_tmp\
            where EXTR_CURR_NUM_CYC_DLQ={bkt}\
            and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
            group by ((year(fdr_date))*100)+month(fdr_date)\
            order by ((year(fdr_date))*100)+month(fdr_date)")


prime.show(50)

推荐阅读