首页 > 解决方案 > 如何有效地计算分类属性的类别(250)?PostgreSQL 或 Python

问题描述

我有一个包含 50 个属性(8 个分类)的大型数据库,我需要创建一个摘要,其中包含按城市和州分组的每个变量的所有类别的计数。其中一个属性有 250 多个类别。

到目前为止,我能够创建一个查询,该查询在按城市分组并导出到 csv 的时间计算每个属性的一个类别。

(select city as "City", COUNT(use4) as "use2056"
from demo
where use4 = '2056'
group by city
order by city asc)

我正在考虑手动复制和粘贴(我知道这将需要很长时间),但我得到了不同行的输出。此外,还有一些与美国同名的城市(我最终需要将其可视化)。我尝试对每个查询使用多个 SELECT 但我无法使其工作。


Select
(select city as "City", COUNT(use4) as "use2056"
from demo
where use4 = '2056'
group by city
order by city asc),
(COUNT(use4) as "use2436"
from demo
where use4 = '2436'
group by city
order by city asc),
(COUNT(use4) as "use9133"
from demo
where use4 = '9133'
group by city
order by city asc)

我还尝试添加城市和县以及其他计数

(select zip as "ZIPCODE", city, county, COUNT(use4) as "Use4count1466", COUNT(use4) as "Use4count9133"
from demo
where use4 = '1466',
where use4 = '9133' 
group by zip, city, county
order by zip asc)

有没有办法有效地做到这一点?创建一个循环来不断计算每个属性的每个类别?一个查询中可以有多少个 SELECT?我需要找到一种方法来显示邮政编码、县、市并计算每个分类属性的所有类别。

标签: postgresqlcountcategorical-data

解决方案


您可以使用过滤聚合在单个查询中执行此操作:

select city, 
       count(*) filter (where use4 = '2056') as use2056,
       count(*) filter (where use4 = '2436') as use2436,
       count(*) filter (where use4 = '9133') as use9133,
from demo
where use4 in ('2056', '2436', '9133')
group by city;

您可以对第二个查询应用相同的内容:

select zip as "ZIPCODE", 
       city, 
       county, 
       count(*) filter (where use4 = '1466') as use4count1466, 
       count(*) filter (where use4 = '9133') as use4count9133
from demo
where use4 in ('1466','9133')
group by zip, city, county

推荐阅读