首页 > 解决方案 > 如何根据另一列的状态计算 DATE 的差异?

问题描述

我有以下数据集,

D_DATE       BIN Number   Disposition    Unit Assigned        
2018-01-04    10005      SWO Issued      PLUMBING DIVISION     
2016-06-23    10005      SWO Issued      SCAFFOLD UNIT         
2016-06-23    10005      SWO Rescinded   SCAFFOLD UNIT         
2018-01-17    10005      SWO Rescinded   PLUMBING DIVISION  
2019-01-04    10006      SWO Rescinded   BEST SQUAD 
2018-12-21    10006      SWO Issued      BEST SQUAD            

我想创建一个新的字段名称“duration”,它将捕获每个 BIN 号的 SWO 发布和撤销之间的时间差。请注意,每个 BIN 编号可以根据日期和不同的单位显示多次。因此,每个单元都可以在相同的 BIN 号上发出 SWO。

我正在尝试获得这样的输出,它将按单位计算 BIN 号上的日期差异,并基于 SWO 颁发和撤销。

输出:

    D_DATE       BIN Number   Disposition    Unit Assigned         Duration
    2018-01-04    10005      SWO Issued      PLUMBING DIVISION     13 Days
    2016-06-23    10005      SWO Issued      SCAFFOLD UNIT         0 days
    2016-06-23    10005      SWO Rescinded   SCAFFOLD UNIT         0 days
    2018-01-17    10005      SWO Rescinded   PLUMBING DIVISION     13 days
    2019-01-04    10006      SWO Rescinded   BEST SQUAD            14 days
    2018-12-21    10006      SWO Issued      BEST SQUAD            14 days

我使用了这个,但它在单元上分组,而不是按每个单元及其处置状态给我单独的 BIN 号,

df2 = df2.groupby('BIN Number')['D_DATE'].agg(['max','min'])

df2["Duration"] = df2['max'].sub(df2['min']).dt.days
print(df2)

感谢任何帮助

标签: pythonpandas

解决方案


对它们进行排序Disposition并使用 shift() 提取差异。然后它提取偶数行并更新奇数行。

import pandas as pd
import io

data = '''
D_DATE "BIN Number" Disposition "Unit Assigned"
2018-01-04 10005 "SWO Issued" "PLUMBING DIVISION"
2016-06-23 10005 "SWO Issued" "SCAFFOLD UNIT"
2016-06-23 10005 "SWO Rescinded" "SCAFFOLD UNIT"
2018-01-17 10005 "SWO Rescinded" "PLUMBING DIVISION"
2019-01-04 10006 "SWO Rescinded" "BEST SQUAD"
2018-12-21 10006 "SWO Issued" "BEST SQUAD"
'''

df = pd.read_csv(io.StringIO(data), sep=' ')
df['D_DATE'] = pd.to_datetime(df['D_DATE'])
# upper code append

df.sort_values(['D_DATE', 'BIN Number', 'Disposition'], inplace=True)
df.reset_index(inplace=True)
df['Duration'] = df['D_DATE'] - df['D_DATE'].shift(1)
df['Duration'][::2] = df['Duration'][1::2]
df.sort_values('index', inplace=True)
df.set_index('index', drop=True)

        D_DATE  BIN Number  Disposition Unit Assigned   Duration
 index                  
0   2018-01-04  10005   SWO Issued      PLUMBING DIVISION   13 days
1   2016-06-23  10005   SWO Issued      SCAFFOLD UNIT       0 days
2   2016-06-23  10005   SWO Rescinded   SCAFFOLD UNIT       0 days
3   2018-01-17  10005   SWO Rescinded   PLUMBING DIVISION   13 days
4   2019-01-04  10006   SWO Rescinded   BEST SQUAD          14 days
5   2018-12-21  10006   SWO Issued      BEST SQUAD          14 days

推荐阅读