首页 > 解决方案 > 将条件 RANK 窗口函数从 SQL 转换为 Pandas

问题描述

我正在尝试将一个窗口函数从 SQL 转换为 Pandas,这仅适用于匹配可能的条件——否则将插入一个 NULL(无)值。

SQL 代码(示例)

SELECT 
    [ID_customer]
    [cTimestamp]
    [TMP_Latest_request].[ID_req] AS [ID of Latest request]
FROM [table].[Customer] AS [Customer]
LEFT JOIN (
    SELECT * FROM(
        SELECT [ID_req], [ID_customer], [rTimestamp], 
        RANK() OVER(PARTITION BY ID_customer ORDER BY rTimestamp DESC) as rnk
        FROM [table].[Customer_request]
    ) AS [Q]
    WHERE rnk = 1
) AS [TMP_Latest_request]
ON [Customer].[ID_customer] = [TMP_Latest_request].[ID_customer]

例子

将最新客户请求的 ID(如果存在)加入客户。

table:Customer
+-------------+------------+
| ID_customer | cTimestamp |
+-------------+------------+
|           1 |       2014 |
|           2 |       2014 |
|           3 |       2015 |
+-------------+------------+

table: Customer_request
+--------+-------------+------------+
| ID_req | ID_customer | rTimestamp |
+--------+-------------+------------+
|      1 |           1 |       2012 |
|      2 |           1 |       2013 |
|      3 |           1 |       2014 |
|      4 |           2 |       2014 |
+--------+-------------+------------+

Result: table:merged
+-------------+------------+----------------------+
| ID_customer | cTimestamp | ID of Latest request |
+-------------+------------+----------------------+
|           1 |       2014 | 3                    |
|           2 |       2014 | 4                    |
|           3 |       2015 | None/NULL            |
+-------------+------------+----------------------+

Python Pandas 中的等价物是什么?

标签: pythonsqlsql-serverpandas

解决方案


除了使用RANK()函数,你可以简单地使用下面的,并且很容易转换。

SELECT A.ID_Customer,A.cTimeStamp,B.ID_req
FROM Customer A
LEFT JOIN (
    SELECT ID_Customer,MAX(ID_req)ID_req 
    FROM Customer_request
    GROUP BY ID_Customer
   )B
ON A.ID_Customer = B.ID_Customer

尝试以下查询,如果您遇到任何问题,请在评论中问我。


推荐阅读