首页 > 解决方案 > 一个语句中的两个 select 和一个 groupby spark.sql

问题描述

我正在尝试这段代码:

spark.sql('''SELECT CD_CLI,
                CASE WHEN SUM(in_lim_crt) > 0 
                    THEN ROUND(SUM(SUM_VL_TTL_FAT)/SUM(in_lim_crt), 4)
                    ELSE -99999999999
                    END AS VL_MED_FAT,
                '2017-01-31' as DT_MVTC
         FROM in_lim_fat
         WHERE DT_MVTC BETWEEN '2016-07-31' AND '2016-12-31'
         UNION
         SELECT CD_CLI,
                 MAX(VL_RPTD_UTZO) AS MAX_VL_RPTD_UTZO,
                 '2017-01-31' AS DT_MVTC
         FROM vl_rptd_utzo
         WHERE DT_EXTC BETWEEN '2016-07-31' AND '2016-12-31'
         GROUP BY CD_CLI
      ''').createOrReplaceTempView('media_vl_fatura_2017_01_31')

我收到以下错误:

SparkStatementException: "grouping expressions sequence is empty, and 'in_lim_fat.`CD_CLI`' is 
not an aggregate function. Wrap '(CASE WHEN (sum(CAST(in_lim_fat.`in_lim_crt` AS BIGINT)) > 
CAST(0 AS BIGINT)) THEN round((CAST(sum(in_lim_fat.`SUM_VL_TTL_FAT`) AS DECIMAL(33,2)) / 
CAST(CAST(sum(CAST(in_lim_fat.`in_lim_crt` AS BIGINT)) AS DECIMAL(20,0)) AS DECIMAL(33,2))), 
4) ELSE CAST(-99999999999L AS DECIMAL(38,4)) END AS `VL_MED_FAT`)' in windowing func.....

错误消息很大,所以我只是粘贴了第一行。

我究竟做错了什么?我想创建这个表,它是其他两个选定元素的并集,但我对 spark.sql 不太了解,所以我的查询构造错误吗?它不是这样工作的吗?

标签: sqlpysparkapache-spark-sql

解决方案


推荐阅读