首页 > 解决方案 > 计算时间列值变化

问题描述

我有一个看起来像这样的数据框:

df = DataFrame({'date': {379724: '2017-01-31',
  379725: '2017-01-31',
  414510: '2017-02-14',
  414509: '2017-02-28',
  414511: '2017-02-28',
  507215: '2017-04-27',
  507213: '2017-04-27',
  507214: '2017-04-27',
  507235: '2017-04-27',
  562139: '2017-04-27',
  672967: '2017-07-27',
  672968: '2017-07-27',
  672969: '2017-07-27',
  910729: '2017-12-07',
  990263: '2018-01-30',
  990265: '2018-01-30',
  990264: '2018-01-30',
  121543: '2018-06-26',
  255129: '2018-09-20'},
 'id': {379724: '110000078451',
  379725: '110000078451',
  414510: '110000078451',
  414509: '110000078451',
  414511: '110000078451',
  507215: '110000078451',
  507213: '110000078451',
  507214: '110000078451',
  507235: '110000078451',
  562139: '110000078451',
  672967: '110000078451',
  672968: '110000078451',
  672969: '110000078451',
  910729: '110000078451',
  990263: '110000078451',
  990265: '110000078451',
  990264: '110000078451',
  121543: '110000078451',
  255129: '110000078451'},
 'limit': {379724: 0,
  379725: 1,
  414510: 1,
  414509: 0,
  414511: 0,
  507215: 0,
  507213: 0,
  507214: 1,
  507235: 0,
  562139: 0,
  672967: 0,
  672968: 0,
  672969: 0,
  910729: 0,
  990263: 0,
  990265: 0,
  990264: 0,
  121543: 0,
  255129: 0})

而且我需要计算'limit'每组的值更改为另一个值的次数'id'

我想出的代码是:

count01 = df.groupby('id')['limit'].rolling(2,min_periods=1)
.apply(lambda x: ((x[0] != x[-1]) & (x[0] == 1)), raw=True)
.groupby('id').sum().astype(int).reset_index(name='count01')

count10 = df.groupby('id')['limit'].rolling(2,min_periods=1)
.apply(lambda x: ((x[0] != x[-1]) & (x[0] == 0)), raw=True)
.groupby('id').sum().astype(int).reset_index(name='count10')

count_total = count01.merge(count10, on='id')

有时它提供正确的结果,有时却没有。我认为apply组中的第一个值可能会被分配 NaN 并且结果会受此影响,但也许不是。

结果应该是:

id          | count01 | count10
-------------------------------
110000078451| 2       | 2

谢谢!

编辑:我更新了我的示例以更符合真实数据。

标签: pythonpandasdataframeapply

解决方案


您可以首先在同一 id 内创建一个包含转换的列,然后使用 pivot_table 来计算这些转换:

df2 = df.shift()
df2['limit'] = df2['limit'].bfill().astype(int)  # force limit to type int in shifted df
df.loc[(df.id==df2.id)&(df.limit!=df2.limit),'transition'] = \
                                   df2.limit.astype(str)+df.limit.astype(str)

resul = df.pivot_table(index='id', columns='transition', aggfunc='count',values='date', fill_value=0)

给予:

transition  01  10
id                
111          2   1
22           0   1

您可以改进演示文稿:

resul = resul.rename(columns=lambda x: 'count'+x).rename_axis('', axis=1).reset_index()

最终得到:

    id  count01  count10
0  111        2        1
1   22        0        1

推荐阅读