python - 计算时间列值变化
问题描述
我有一个看起来像这样的数据框:
df = DataFrame({'date': {379724: '2017-01-31',
379725: '2017-01-31',
414510: '2017-02-14',
414509: '2017-02-28',
414511: '2017-02-28',
507215: '2017-04-27',
507213: '2017-04-27',
507214: '2017-04-27',
507235: '2017-04-27',
562139: '2017-04-27',
672967: '2017-07-27',
672968: '2017-07-27',
672969: '2017-07-27',
910729: '2017-12-07',
990263: '2018-01-30',
990265: '2018-01-30',
990264: '2018-01-30',
121543: '2018-06-26',
255129: '2018-09-20'},
'id': {379724: '110000078451',
379725: '110000078451',
414510: '110000078451',
414509: '110000078451',
414511: '110000078451',
507215: '110000078451',
507213: '110000078451',
507214: '110000078451',
507235: '110000078451',
562139: '110000078451',
672967: '110000078451',
672968: '110000078451',
672969: '110000078451',
910729: '110000078451',
990263: '110000078451',
990265: '110000078451',
990264: '110000078451',
121543: '110000078451',
255129: '110000078451'},
'limit': {379724: 0,
379725: 1,
414510: 1,
414509: 0,
414511: 0,
507215: 0,
507213: 0,
507214: 1,
507235: 0,
562139: 0,
672967: 0,
672968: 0,
672969: 0,
910729: 0,
990263: 0,
990265: 0,
990264: 0,
121543: 0,
255129: 0})
而且我需要计算'limit'
每组的值更改为另一个值的次数'id'
。
我想出的代码是:
count01 = df.groupby('id')['limit'].rolling(2,min_periods=1)
.apply(lambda x: ((x[0] != x[-1]) & (x[0] == 1)), raw=True)
.groupby('id').sum().astype(int).reset_index(name='count01')
count10 = df.groupby('id')['limit'].rolling(2,min_periods=1)
.apply(lambda x: ((x[0] != x[-1]) & (x[0] == 0)), raw=True)
.groupby('id').sum().astype(int).reset_index(name='count10')
count_total = count01.merge(count10, on='id')
有时它提供正确的结果,有时却没有。我认为apply
组中的第一个值可能会被分配 NaN 并且结果会受此影响,但也许不是。
结果应该是:
id | count01 | count10
-------------------------------
110000078451| 2 | 2
谢谢!
编辑:我更新了我的示例以更符合真实数据。
解决方案
您可以首先在同一 id 内创建一个包含转换的列,然后使用 pivot_table 来计算这些转换:
df2 = df.shift()
df2['limit'] = df2['limit'].bfill().astype(int) # force limit to type int in shifted df
df.loc[(df.id==df2.id)&(df.limit!=df2.limit),'transition'] = \
df2.limit.astype(str)+df.limit.astype(str)
resul = df.pivot_table(index='id', columns='transition', aggfunc='count',values='date', fill_value=0)
给予:
transition 01 10
id
111 2 1
22 0 1
您可以改进演示文稿:
resul = resul.rename(columns=lambda x: 'count'+x).rename_axis('', axis=1).reset_index()
最终得到:
id count01 count10
0 111 2 1
1 22 0 1