pandas - 每个键和累积指标 Pandas 填充数据帧
问题描述
我有以下数据框:
import pandas as pd
before_padding = pd.DataFrame(data={'user_id': [1,1,1,1,2, 2,3],
'days_past': [1,2,3,4, 2, 3,2],
'pay': [11,12,13,16, 17,18,10]})
它为每个用户提供其 ID、他在系统中的天数以及他支付的金额(累计)。但是,对于我的使用,我想为每个用户填充它,以便数据将包括每个用户可能的最大和最小天数(所有 DF 的最小和最大天数),如果有一天没有付款记录,它将有前一天的付款值(如果存在)或 0,如下所示:
after_padding=pd.DataFrame(data={'user_id': [1,1,1,1,2, 2,2,2,3,3,3,3],
'days_past': [1,2,3,4,1 ,2, 3,4,1,2,3,4],
'pay': [11,12,13, 16,0,17,18,18,0,10,10,10]})
提前致谢!!!
解决方案
使用set_index
withunstack
进行整形、前向填充缺失值、stack
返回、替换组开头的缺失值 byfillna
和 last reset_index
:
df = (before_padding.set_index(['user_id','days_past'])['pay']
.unstack()
.ffill(axis=1)
.stack(dropna=False)
.fillna(0, downcast='infer')
.reset_index(name='pay'))
print (df)
user_id days_past pay
0 1 1 11
1 1 2 12
2 1 3 13
3 1 4 16
4 2 1 0
5 2 2 17
6 2 3 18
7 2 4 18
8 3 1 0
9 3 2 10
10 3 3 10
11 3 4 10
推荐阅读
- java - SQLite 异常:没有这样的列
- android - How to get address name by building ID in 2GIS?
- html - 如何显示块flex的最后一部分?
- google-api - "get premium account" error with the Google Places API
- multithreading - How does Dart/Flutter execute code concurrently even if it's single threaded?
- regex - Regex confusion on Notepad++
- javascript - how to make the value of useEffect available globally (or ouside the useEffect)
- vb.net - So I want the program to compute the grades inputted by the user when the user types in "N"
- android - how to use expo-image-picker with android 11 scoped storage?
- python - Problem Publishing an Array with Rosserial