首页 > 解决方案 > 如何使用满足特定条件的总和来创建输出

问题描述

我有一张表medicalclaims,其中包含几列感兴趣的列:patient_idvendorclaim_iddiagnosis_codeprocedure_codestate

简而言之,我想提取 的总数patient_id和 的总数claim_id,但我也想检查某些diagnosis_codeprocedure_code和的总数state。例如,我希望我的输出列看起来像这样:

vender | count(distinct patient_id) | count (distinct claim_id) | procedure_code = E123 | state = CA

目前,我正在使用此查询供应商、患者 ID 计数和声明 ID 计数:

SELECT distinct vendor, count (distinct patient_id) AS Patients, count (distinct claim_id) AS Claims
FROM medicalclaims
GROUP BY 1
ORDER BY 1

但是,我不确定如何构建剩余的列来捕获特定diagnosis_codeprocedure_code、 和state感兴趣的值。我想提取 CA 或 NY 患者的总数以及特定诊断代码或过程代码的总和。所以基本上在输出中添加看起来像这样的列:

SELECT count(distinct patient_id)
FROM medicalclaims
WHERE state = 'CA'

SELECT count(distinct patient_id)
FROM medicalclaims
WHERE procedure_code = 'xyz'

标签: sqlapache-spark-sql

解决方案


我想你想要:

SELECT vendor, count(distinct patient_id) AS Patients,
       count(distinct claim_id) AS Claims,
       sum(case when procedure_code = 'xyz' then 1 else 0 end) as procedure_xyz,
       sum(case when state = 'xyz' then 1 else 0 end) as state_ca
FROM medicalclaims
GROUP BY 1
ORDER BY 1

推荐阅读