sql - SQL:用条件填充缺失的记录
问题描述
我需要按日期计算库存中存在的产品数量。然而,在数据库中,产品仅在消费者查看时才被记录。
例如考虑这个基本的表结构:
date | productId | views
July 1 | A | 8
July 2 | A | 6
July 2 | B | 4
July 3 | A | 2
July 4 | A | 8
July 4 | B | 6
July 4 | C | 4
July 5 | C | 2
July 10 | A | 17
使用以下查询,我尝试确定给定日期库存中的产品数量。
select date, count(distinct productId) as Inventory, sum(views) as views
from (
select date, productId, count(*) as views
from SomeTable
group by date, productID
order by date asc, productID asc
)
group by date
这是输出
date | Inventory | views
July 1 | 1 | 8
July 2 | 2 | 10
July 3 | 1 | 2
July 4 | 3 | 18
July 5 | 1 | 2
July 10 | 1 | 17
由于缺少行,我的输出不能准确反映库存中有多少产品。
对库存的正确理解如下:
- 产品 A 于 7 月 1 日至 7 月 10 日存在库存。
- 产品 B 存在于 7 月 2 日至 7
月 4 日的库存。 - 产品 C 存在于 7 月 4 日至 7 月 5 日的库存。
正确的 SQL 输出应该是:
date | Inventory | views
July 1 | 1 | 8
July 2 | 2 | 10
July 3 | 2 | 2
July 4 | 3 | 18
July 5 | 2 | 2
July 6 | 1 | 0
July 7 | 1 | 0
July 8 | 1 | 0
July 9 | 1 | 0
July 10 | 1 | 17
如果您继续跟进,请让我确认我很乐意将“库存中”定义为第一个视图和最后一个视图之间的日期差。
我遵循了以下错误过程:
首先,我创建了一个表格,它是每个 productID 和每个日期的笛卡尔积。''' 日期为(从 SomeTable 中选择日期,按日期分组),产品为(从 SomeTable 中按 productId 选择 productId)选择 Dates.date,Products.productId 从 Dates 中交叉加入产品 '''
然后我尝试进行右外连接以将其减少为仅丢失的记录:
with Records as (
select date, productId, count(*) as views
from SomeTable
group by date, productId
),
Cartesian as (
{See query above}
)
Select Cartesian.date, Cartesian.productId, 0 as views #for upcoming union
from Cartesian right outer join Records
on Cartesian.date = Records.date
where Records.productId is null
然后,将缺少的行合并到记录中。这样做,我创建了一个新问题:额外的行。
date | productId | views
July 1 | A | 8
July 1 | B | 0
July 1 | C | 0
July 2 | A | 6
July 2 | B | 4
July 2 | C | 0
July 3 | A | 2
July 3 | B | 0
July 3 | C | 0
July 4 | A | 8
July 4 | B | 6
July 4 | C | 4
July 5 | A | 2
July 5 | B | 0
July 5 | C | 0
July 6 | A | 0
July 6 | B | 0
July 6 | C | 0
July 7 | A | 0
July 7 | B | 0
July 7 | C | 0
July 8 | A | 0
July 8 | B | 0
July 8 | C | 0
July 9 | A | 0
July 9 | B | 0
July 9 | C | 0
July 10 | A | 17
July 10 | B | 0
July 10 | C | 0
select date, count(distinct productId) as Inventory, sum(views) as views
当我在该表上运行我的简单查询
时,我再次得到错误的输出:
date | Inventory | views
July 1 | 3 | 8
July 2 | 3 | 10
July 3 | 3 | 2
July 4 | 3 | 18
July 5 | 3 | 2
July 6 | 3 | 0
July 7 | 3 | 0
July 8 | 3 | 0
July 9 | 3 | 0
July 10 | 3 | 17
我的下一个想法是遍历每个 productId,确定它的第一个和最后一个日期,然后将它与笛卡尔表联合,条件是 Cartesian.date 介于每个特定产品的第一个和最后一个日期之间。
必须有一种更简单的方法来做到这一点。谢谢。
解决方案
以下是 BigQuery 标准 SQL
#standardSQL
WITH dates AS (
SELECT day FROM (
SELECT MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) day
), ranges AS (
SELECT productId, MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table` t
GROUP BY productId
)
SELECT day, COUNT(DISTINCT productId) Inventory, SUM(IFNULL(views, 0)) views
FROM dates d, ranges r
LEFT JOIN `project.dataset.table` USING(day, productId)
WHERE day BETWEEN min_day AND max_day
GROUP BY day
如果适用于您的问题中的示例数据,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2019-07-01' day, 'A' productId, 8 views UNION ALL
SELECT '2019-07-02', 'A', 6 UNION ALL
SELECT '2019-07-02', 'B', 4 UNION ALL
SELECT '2019-07-03', 'A', 2 UNION ALL
SELECT '2019-07-04', 'A', 8 UNION ALL
SELECT '2019-07-04', 'B', 6 UNION ALL
SELECT '2019-07-04', 'C', 4 UNION ALL
SELECT '2019-07-05', 'C', 2 UNION ALL
SELECT '2019-07-10', 'A', 17
), dates AS (
SELECT day FROM (
SELECT MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) day
), ranges AS (
SELECT productId, MIN(day) min_day, MAX(day) max_day
FROM `project.dataset.table` t
GROUP BY productId
)
SELECT day, COUNT(DISTINCT productId) Inventory, SUM(IFNULL(views, 0)) views
FROM dates d, ranges r
LEFT JOIN `project.dataset.table` USING(day, productId)
WHERE day BETWEEN min_day AND max_day
GROUP BY day
-- ORDER BY day
结果是
Row day Inventory views
1 2019-07-01 1 8
2 2019-07-02 2 10
3 2019-07-03 2 2
4 2019-07-04 3 18
5 2019-07-05 2 2
6 2019-07-06 1 0
7 2019-07-07 1 0
8 2019-07-08 1 0
9 2019-07-09 1 0
10 2019-07-10 1 17
推荐阅读
- wordpress - Wordpress 在上传时自动缩放图像
- php - php str_replace() 不会删除空格
- python - Python:计算不同格式的两个日期之间的天数?
- firebase - 使用 API REST Firebase 和 Flutter 实现离线持久化
- tensorflow - 如何从多个 TFRecord 文件中的数据创建 TensorFlow 2 生成器?
- java - 想使用 Comparator 按时间戳在 ArrayList 中排序聊天,但它不起作用,我不知道为什么
- yii2 - 如何摆脱 Froala 所见即所得编辑器的基于文本的水印(Yii2 小部件)
- docker - 如何将多个文件复制到 Docker 容器内的特定目录/位置
- python - 有什么方法可以将 Kivy 相机照片保存为 base64 而不将其导出为 png?
- c# - 使用 StaticFileOptions() 从 Razor 类库中中断嵌入的静态文件