首页 > 解决方案 > How to get count and use that count in a calculation with out nested query?

问题描述

I'm trying to get a count and a rate using that count. Every query I've tried that does not used a nested query gives me a grouping error as the tuple that uses the count in the calculation cannot be in the group by expression.

The query below gets me the data I'm looking for. Is there a way to write this query as a single select with out the nested query?

select
  org,
  format_number(distinct_preg,0),
  format_number(total_patients,0),
  format_number((distinct_preg/total_patients * 100),2) preg_rate
from (
  select
    preg.org,
    count(distinct preg.patient_id) distinct_preg,
    tot.total_patients total_patients
  from
    pregnancy preg
    join (
      select org, count(distinct patient_id) total_patients from enc group by 1
    ) tot on preg.org = tot.org
  group by 1,3
)
order by 1

--- UPDATE ----------------------------

This seems to have something to do with the format_number() function.

This query works:

select
  preg.org,
  format_number(count(distinct preg.patient_id),0) distinct_preg,
  tot.total_patients total_patients,
  format_number((count(distinct preg.patient_id) / tot.total_patients * 100),2) preg_rate
from
  pregnancy preg
  join (
    select org, count(distinct patient_id) total_patients from enc group by 1
  ) tot on preg.org = tot.org
group by 1,3

This one gives the error shown and differs only by the format call:

select
  preg.org,
  format_number(count(distinct preg.patient_id),0) distinct_preg,
  format_number(tot.total_patients,0) total_patients,
  format_number((count(distinct preg.patient_id) / tot.total_patients * 100),2) preg_rate
from
  pregnancy preg
  join (
    select org, count(distinct patient_id) total_patients from enc group by 1
  ) tot on preg.org = tot.org
group by 1,3

enter image description here

标签: sqlapache-sparkdatabricks

解决方案


If you want the average number of pregnancies per patient, you can use:

select org, count(*) as num_pregnancies, count(distinct patient_id) as num_patients,
       count(*) * 1.0 / count(distinct patient_id) as avg_pregnancies_per_patient
from pregnancies p
group by org;

Without a clear explanation of what you are trying to do, this is a bit of speculation. However, it seems like a very reasonable metric and one you are attempting to calculate.


推荐阅读