postgresql - postgresql - 尝试在范围标准内对类似数据进行分组
问题描述
嗨——我的 sql 技能不是最好的,但我被分配到一个深入数据的项目。基本上,我需要根据多个标准为类似项目分配一个组值。我可以很好地匹配 2 列,但只有当行的值落在彼此的特定范围内时,第 3 列和第 4 列才适用。
示例标准
1. Match Item Code on first 4 digits after leading zeros.
2. Match Description based on first word only.
3. Outlet Number, only group if value is +/- 200 of each other
4. Cost, only group if value is within .75 of each other
“组”值应该是这样的(这完全是虚构的数据):
Item_Code Description Cat_Number Cost Group
123401 HM Book Green 5000 15.50 1
0123402 HM Book Blue 5200 14.75 2
123403 HM Book Orange 5100 14.75 2
0123404 HM Book Red 4700 14.75 3
123405 HM Book Yellow 5200 14.50 2
798321 Car Light Bulb 4000 5.25 4
798322 Car Light Lens 4610 5.50 5
798330 Car Light fuse 4600 4.75 5
0074789 Pencil #2 9900 1.25 6
747889 Pencil 9800 0.99 6
747890 Pencil #1 9100 1.10 7
074788 Pencils 9345 1.99 8
074785 Pencil B 8000 1.50 9
这是我前两列的内容,效果很好,甚至似乎在成本列上工作,但如果在 200 + / - 彼此的范围 - 我尝试过的没有任何工作:
SELECT SUBSTR(TRIM(item_code, '0'), 1, 4),
SPLIT_PART(description, ' ', 1),
cost::numeric(19,2),
DENSE_RANK() OVER (ORDER BY SUBSTR(TRIM(item_code, '0'), 1, 4),
SPLIT_PART(description, ' ', 1), cost::numeric(19,2)) AS group_number
FROM some_items
GROUP BY SUBSTR(TRIM(item_code, '0'), 1, 4),
SPLIT_PART(description, ' ', 1),
(
(cost = cost +.75),
(cost = cost -.75)
), cost
ORDER BY item_code;
我尝试了与 Cost 相同的逻辑,我尝试了公用表表达式和连接、嵌套选择、Having 子句的不同组合——但一切都回到我想以某种方式在组中合并一个范围 for猫号也是。成本似乎正在发挥作用也可能是一种侥幸——但到目前为止,它看起来一直是准确的。我非常感谢任何人可以提供的任何帮助或指导。谢谢!!
解决方案
我找到了诀窍!我得到的组,即使它们没有与示例相同的组 ID,也由相同的对象组成。
生成的查询既丑陋又复杂,我会在这个答案的最后把它带给你,但实际上最有趣的是这个技巧。
一开始,为了简洁起见,我将只考虑列cat_number
(因为成本相同);在那之后,我会把所有的和平放在一起。
让我们一步一步来!
select item_code, description, cat_number,
cat_number - lag(cat_number,1,cat_number) over(order by cat_number) as le
from table1;
我们有这个输出:
| item_code | description | cat_number | le |
| --------- | -------------- | ---------- | ---- |
| 798321 | Car Light Bulb | 4000 | 0 |
| 798330 | Car Light fuse | 4600 | 600 |
| 798322 | Car Light Lens | 4610 | 10 |
| 123404 | HM Book Red | 4700 | 90 |
| 123401 | HM Book Green | 5000 | 300 |
| 123403 | HM Book Orange | 5100 | 100 |
| 123405 | HM Book Yellow | 5200 | 100 |
| 123402 | HM Book Blue | 5200 | 0 |
| 74785 | Pencil B | 8000 | 2800 |
| 747890 | Pencil #1 | 9100 | 1100 |
| 74788 | Pencils | 9345 | 245 |
| 747889 | Pencil | 9800 | 455 |
| 74789 | Pencil #2 | 9900 | 100 |
le
cat_number
包含每一行的值与前一行的值之间的差异。
再一步:
select item_code
, description
, cat_number
, case
when le >= 200 then cat_number
else null
end as lle from(
select item_code, description, cat_number,
cat_number - lag(cat_number,1,cat_number) over(order by cat_number) as le
from table1
) s1;
| item_code | description | cat_number | lle |
| --------- | -------------- | ---------- | ---- |
| 798321 | Car Light Bulb | 4000 | |
| 798330 | Car Light fuse | 4600 | 4600 |
| 798322 | Car Light Lens | 4610 | |
| 123404 | HM Book Red | 4700 | |
| 123401 | HM Book Green | 5000 | 5000 |
| 123403 | HM Book Orange | 5100 | |
| 123405 | HM Book Yellow | 5200 | |
| 123402 | HM Book Blue | 5200 | |
| 74785 | Pencil B | 8000 | 8000 |
| 747890 | Pencil #1 | 9100 | 9100 |
| 74788 | Pencils | 9345 | 9345 |
| 747889 | Pencil | 9800 | 9800 |
| 74789 | Pencil #2 | 9900 | |
lle
仅当对应的值cat_number
不在与前列相同的范围内时,我们才有一个值。
再一步
select item_code
, description
, cat_number
, max(lle) over (order by cat_number) from (
select *
, case
when le >= 200 then cat_number
else null
end as lle from(
select item_code, description, cat_number,
cat_number - lag(cat_number,1,cat_number) over(order by cat_number) as le
from table1
) s1
)s2;
| item_code | description | cat_number | max |
| --------- | -------------- | ---------- | ---- |
| 798321 | Car Light Bulb | 4000 | |
| 798330 | Car Light fuse | 4600 | 4600 |
| 798322 | Car Light Lens | 4610 | 4600 |
| 123404 | HM Book Red | 4700 | 4600 |
| 123401 | HM Book Green | 5000 | 5000 |
| 123403 | HM Book Orange | 5100 | 5000 |
| 123405 | HM Book Yellow | 5200 | 5000 |
| 123402 | HM Book Blue | 5200 | 5000 |
| 74785 | Pencil B | 8000 | 8000 |
| 747890 | Pencil #1 | 9100 | 9100 |
| 74788 | Pencils | 9345 | 9345 |
| 747889 | Pencil | 9800 | 9800 |
| 74789 | Pencil #2 | 9900 | 9800 |
我们用迄今为止遇到的最大值填充空值......现在我们可以使用dense_rank()
(正如你所猜测的)
select item_code
, description
, cat_number
, dense_rank() over (order by llle) from (
select *
, max(lle) over (order by cat_number) as llle from (
select *
, case
when le >= 200 then cat_number
else null
end as lle from(
select item_code, description, cat_number,
cat_number - lag(cat_number,1,cat_number) over(order by cat_number) as le
from table1
) s1
)s2
)s3;
| item_code | description | cat_number | dense_rank |
| --------- | -------------- | ---------- | ---------- |
| 798330 | Car Light fuse | 4600 | 1 |
| 798322 | Car Light Lens | 4610 | 1 |
| 123404 | HM Book Red | 4700 | 1 |
| 123405 | HM Book Yellow | 5200 | 2 |
| 123402 | HM Book Blue | 5200 | 2 |
| 123401 | HM Book Green | 5000 | 2 |
| 123403 | HM Book Orange | 5100 | 2 |
| 74785 | Pencil B | 8000 | 3 |
| 747890 | Pencil #1 | 9100 | 4 |
| 74788 | Pencils | 9345 | 5 |
| 74789 | Pencil #2 | 9900 | 6 |
| 747889 | Pencil | 9800 | 6 |
| 798321 | Car Light Bulb | 4000 | 7 |
我们在同一组中只有cat_number
指定范围内的值!
现在将所有内容放在同一个查询中!
select item_code
, description
, cat_number
, cost
, dense_rank() over (
order by
substring(item_code::text, 0, 4),
substring(description, 0, 4),
llle,
lllc)
from (
select *
, max(lle) over (order by cat_number) as llle
, max(llc) over (order by cost) lllc from (
select *
, case
when le >= 200 then cat_number
else null
end as lle
, case
when lc >= 0.75 then cost
else null
end as llc
from(
select item_code, description, cat_number,cost,
cat_number - lag(cat_number,1,cat_number) over(order by cat_number) as le,
cost - lag(cost, 1, cost) over (order by cost) as lc
from table1
) s1
)s2
)s3;
| item_code | description | cat_number | cost | dense_rank |
| --------- | -------------- | ---------- | ----- | ---------- |
| 123404 | HM Book Red | 4700 | 14.75 | 1 |
| 123402 | HM Book Blue | 5200 | 14.75 | 2 |
| 123405 | HM Book Yellow | 5200 | 14.50 | 2 |
| 123403 | HM Book Orange | 5100 | 14.75 | 2 |
| 123401 | HM Book Green | 5000 | 15.50 | 3 |
| 74785 | Pencil B | 8000 | 1.50 | 4 |
| 747890 | Pencil #1 | 9100 | 1.10 | 5 |
| 74788 | Pencils | 9345 | 1.99 | 6 |
| 747889 | Pencil | 9800 | 0.99 | 7 |
| 74789 | Pencil #2 | 9900 | 1.25 | 7 |
| 798322 | Car Light Lens | 4610 | 5.50 | 8 |
| 798330 | Car Light fuse | 4600 | 4.75 | 8 |
| 798321 | Car Light Bulb | 4000 | 5.25 | 9 |
Ps:item_code
开头没有0,我用了一个子字符串而不是你更正确的表达,但概念是一样的。
再见
推荐阅读
- javascript - 如果其他反应,你可以内联运行函数吗?
- sql-server - 标记满足两个条件且第三个值最低的所有行
- c++ - 从 gcc 4.6.3 移植到 5.4.0 时出现 basic_string.h 错误
- git - macOS Git 客户端可以使用存储在用户 Keychain 中的证书吗?
- docker - .dockerfile 扩展名是什么?
- html - 为什么对齐项目中心不起作用?
- javascript - Appium 错误处理不起作用
- rpm - Yocto 和使用签名 rpm 包存储库时的图像生成
- docker - Strapi 路由返回 404 未找到
- python - 为什么不安装芹菜