首页 > 解决方案 > 查找分区中是否已存在值 (SQL)

问题描述

所以我有以下数据:

term    period  id
7       201910  A2C5
7       201911  A2C5
7       201912  A2C5
7       202001  A2C5
7       202002  A2C5
7       202003  A2C5
7       202004  A2C5

我必须找到每个时期每个学期新 ID 的百分比。所以基本上在上表中,ID A2C5 首先出现在第 7 学期的 201910 期间,该行应该显示 100%。其他行的其余部分应显示 0%,因为该 id 之前已经存在。

标签: postgresql

解决方案


这个解决方案有点复杂,但它应该能让你到达你需要去的地方。它首先查找每个 ID 的最早期限和期限,然后将期限中的新 ID 计数除以同一期限中的所有 ID 的计数。

-- Find the earliest instance of each ID
WITH earliest_instances AS (
    SELECT DISTINCT ON (id) * 
    FROM your_table
    ORDER BY id, term, period  -- you may need to adjust this order
), 
-- For each term and period, count how many IDs showed up there first
earliest_counts AS (
    SELECT term, period, count(*) 
    FROM earliest_instances 
    GROUP BY term, period
), 
-- For each term and period, count how many IDs there are in total
total_counts AS (
    SELECT term, period, count(*) 
    FROM your_table
    GROUP BY term, period
)
-- Put those together to make a percentage of new IDs
SELECT 
    total_counts.term,
    total_counts.period, 
    100 * COALESCE(earliest_counts.count, 0) / total_counts.count AS new_ids_percentage 
FROM total_counts 
LEFT JOIN earliest_counts ON (
    total_counts.term = earliest_counts.term 
    AND total_counts.period = earliest_counts.period
)
ORDER BY term, period;

 term | period | new_ids_percentage
------+--------+--------------------
 7    | 201910 |                100
 7    | 201911 |                  0
 7    | 201912 |                  0
 7    | 202001 |                  0
 7    | 202002 |                  0
 7    | 202003 |                  0
 7    | 202004 |                  0
(7 rows)

推荐阅读