首页 > 解决方案 > 根据时差和熊猫数据框中列的条件创建一个新的数据框

问题描述

我有一个数据框,其中有一个工单 ID,以及下表中显示的日期时间列采取的后续操作。

ticketID    ChangeDate  OldStatus   NewStatus
0   1012327 2019-03-18 09:00:32.903 R or O  Action mail sent to client
1   1012327 2019-03-18 09:21:34.820 Action mail sent to client  Response Client - R
2   1012327 2019-03-18 09:34:21.890 Response Client - R Status Updated
3   1012328 2019-03-18 07:00:09.960 R or O  ticket Closed - None
4   1012328 2019-03-18 07:09:31.420 ticket Closed - None    Status Updated
5   1012329 2019-03-18 06:52:03.490 R or O  ticket Closed - Satisfied
6   1012329 2019-03-18 07:09:33.433 ticket Closed - Satisfied   Status Updated
7   1012330 2019-03-18 10:25:13.493 R or O  Action mail sent to Service
8   1012330 2019-03-18 10:55:20.963 Action mail sent to Service ticket Closed - Service Responded
9   1012330 2019-03-18 11:02:05.327 ticket Closed - Service Responded   Status Updated
10  1012332 2019-03-18 09:00:41.967 R or O  Action mail sent to client
11  1012332 2019-03-18 10:24:20.150 Action mail sent to client  Response Client - R
12  1012332 2019-03-18 10:32:40.717 Response Client - R Status Updated

熊猫表中的数据框

现在,我有一些工单 ID,其中一些根据提供的状态进行了更多观察。您还可以看到,对于票证 ID,对于下一次观察,新状态变为旧状态,并为其提供新状态,该状态不断更新,直到我采取一些关闭操作。

我想创建一个具有以下格式的新数据框/系列。

ticket ID  | Datetime1  | Oldest Status  | New Status | Datetime2  | New Status2| Datetime3  | New Status3 ....

这样我就有了最旧的状态和新的状态,日期如上所示,直到我们为每个票证 ID 执行此操作。

我的计划是稍后使用这个数据集来计算时间差。

标签: pythonpandasnumpydataframedata-science

解决方案


我将首先使用 groupbyticketID来计算每张票的排名,然后使用该排名作为列和 ticketID 作为索引来旋转数据框以获得预期的数据。

对列进行排序后,您将获得预期的数据框。是时候重命名列并重置索引以获得漂亮的数据框了。代码可以是:

df['rank'] = df.groupby('ticketID').apply(lambda x:
                                          pd.Series(range(len(x)))).values
resul = df.pivot('ticketID', 'rank').fillna('')
resul.columns = resul.columns.swaplevel()
resul.sort_index(axis=1,inplace=True, level=0, sort_remaining=False)
resul.columns = ['{1}_{0}'.format(*c) for c in resul.columns]
resul.reset_index(inplace=True)

使用您的示例数据,它提供:

   ticketID             ChangeDate_0 OldStatus_0                  NewStatus_0             ChangeDate_1                  OldStatus_1                        NewStatus_1             ChangeDate_2                        OldStatus_2     NewStatus_2
0   1012327  2019-03-18 09:00:32.903      R or O   Action mail sent to client  2019-03-18 09:21:34.820   Action mail sent to client                Response Client - R  2019-03-18 09:34:21.890                Response Client - R  Status Updated
1   1012328  2019-03-18 07:00:09.960      R or O         ticket Closed - None  2019-03-18 07:09:31.420         ticket Closed - None                     Status Updated                                                                            
2   1012329  2019-03-18 06:52:03.490      R or O    ticket Closed - Satisfied  2019-03-18 07:09:33.433    ticket Closed - Satisfied                     Status Updated                                                                            
3   1012330  2019-03-18 10:25:13.493      R or O  Action mail sent to Service  2019-03-18 10:55:20.963  Action mail sent to Service  ticket Closed - Service Responded  2019-03-18 11:02:05.327  ticket Closed - Service Responded  Status Updated
4   1012332  2019-03-18 09:00:41.967      R or O   Action mail sent to client  2019-03-18 10:24:20.150   Action mail sent to client                Response Client - R  2019-03-18 10:32:40.717                Response Client - R  Status Updated

推荐阅读