python - 将条件 RANK 窗口函数从 SQL 转换为 Pandas
问题描述
我正在尝试将一个窗口函数从 SQL 转换为 Pandas,这仅适用于匹配可能的条件——否则将插入一个 NULL(无)值。
SQL 代码(示例)
SELECT
[ID_customer]
[cTimestamp]
[TMP_Latest_request].[ID_req] AS [ID of Latest request]
FROM [table].[Customer] AS [Customer]
LEFT JOIN (
SELECT * FROM(
SELECT [ID_req], [ID_customer], [rTimestamp],
RANK() OVER(PARTITION BY ID_customer ORDER BY rTimestamp DESC) as rnk
FROM [table].[Customer_request]
) AS [Q]
WHERE rnk = 1
) AS [TMP_Latest_request]
ON [Customer].[ID_customer] = [TMP_Latest_request].[ID_customer]
例子
将最新客户请求的 ID(如果存在)加入客户。
table:Customer
+-------------+------------+
| ID_customer | cTimestamp |
+-------------+------------+
| 1 | 2014 |
| 2 | 2014 |
| 3 | 2015 |
+-------------+------------+
table: Customer_request
+--------+-------------+------------+
| ID_req | ID_customer | rTimestamp |
+--------+-------------+------------+
| 1 | 1 | 2012 |
| 2 | 1 | 2013 |
| 3 | 1 | 2014 |
| 4 | 2 | 2014 |
+--------+-------------+------------+
Result: table:merged
+-------------+------------+----------------------+
| ID_customer | cTimestamp | ID of Latest request |
+-------------+------------+----------------------+
| 1 | 2014 | 3 |
| 2 | 2014 | 4 |
| 3 | 2015 | None/NULL |
+-------------+------------+----------------------+
Python Pandas 中的等价物是什么?
解决方案
除了使用RANK()
函数,你可以简单地使用下面的,并且很容易转换。
SELECT A.ID_Customer,A.cTimeStamp,B.ID_req
FROM Customer A
LEFT JOIN (
SELECT ID_Customer,MAX(ID_req)ID_req
FROM Customer_request
GROUP BY ID_Customer
)B
ON A.ID_Customer = B.ID_Customer
尝试以下查询,如果您遇到任何问题,请在评论中问我。
推荐阅读
- android - Jetpack Compose - 在文本字段中捕获关键事件
- r - 如何在 R 的 data.frame 中找到下一个匹配项?
- .net - 来自 shapesheet 用户操作的 Visio VSTO 加载项 QueueMarker 事件,添加一个形状但在重做时获得两个形状
- gpu - 如何将 GPU 模拟器添加到 QEMU
- dockerhub - DockerHub:拒绝:请求的资源访问被拒绝:docker
- svg - 为分组路径添加圆角
- api - Shopify 集成
- c - GNU autotool 共享库依赖 构建正常但运行时符号查找错误
- java - 不支持的类型 java.util.concurrent.atomic.AtomicReference
- google-datalayer - 我的网站缺少 Datalayer 代码,如何在单击按钮时添加它们?