python - 多列的 Pandas Cumcount()
问题描述
我有一个看起来像这样的数据框:
data = {'exercise': ['squat', 'squat', 'squat', 'squat', 'bench', 'bench', 'bench', 'bench', 'squat', 'squat', 'squat', 'squat', 'bench', 'bench', 'bench', 'bench'],
'session': [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
'weight': [100, 100, 120, 120, 80, 80, 100, 110, 120, 130, 140, 150, 80, 90, 100, 110],
'velocity': [0.30, 0.25, 0.20, 0.15, 0.30, 0.25, 0.20, 0.15, 0.30, 0.25, 0.20, 0.15, 0.30, 0.25, 0.20, 0.15]}
df = pd.DataFrame(data, columns = data.keys())
print(df)
exercise session weight velocity
0 squat 0 100 0.30
1 squat 0 100 0.25
2 squat 0 120 0.20
3 squat 0 120 0.15
4 bench 0 80 0.30
5 bench 0 80 0.25
6 bench 0 100 0.20
7 bench 0 110 0.15
8 squat 1 120 0.30
9 squat 1 130 0.25
10 squat 1 140 0.20
11 squat 1 150 0.15
12 bench 1 80 0.30
13 bench 1 90 0.25
14 bench 1 100 0.20
15 bench 1 110 0.15
我想要做的是添加两列,一列用于设置编号,另一列用于代表编号。每次改变体重的练习和会话相同,组数应增加 1,否则重置为 0。
如果运动、训练和体重相同,则每次速度变化的重复次数应增加 1,否则重置为 0。
我上面写的逻辑是有缺陷的。我的意思是,设定的数字应该随着体重(每条线)的每次变化而增加,但如果运动或训练发生变化,则重置为 0。
代表计数应该是每组中的行数。
像这样:
data = {'exercise': ['squat', 'squat', 'squat', 'squat', 'bench', 'bench', 'bench', 'bench', 'squat', 'squat', 'squat', 'squat', 'bench', 'bench', 'bench', 'bench'],
'session': [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
'weight': [100, 100, 120, 120, 80, 80, 100, 110, 120, 130, 140, 150, 80, 90, 100, 110],
'velocity': [0.30, 0.25, 0.20, 0.15, 0.30, 0.25, 0.20, 0.15, 0.30, 0.25, 0.20, 0.15, 0.30, 0.25, 0.20, 0.15],
'set': [0, 0, 1, 1, 0, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3],
'rep': [0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}
df = pd.DataFrame(data, columns = data.keys())
print(df)
exercise session weight velocity set rep
0 squat 0 100 0.30 0 0
1 squat 0 100 0.25 0 1
2 squat 0 120 0.20 1 0
3 squat 0 120 0.15 1 1
4 bench 0 80 0.30 0 0
5 bench 0 80 0.25 0 1
6 bench 0 100 0.20 1 0
7 bench 0 110 0.15 2 0
8 squat 1 120 0.30 0 0
9 squat 1 130 0.25 1 0
10 squat 1 140 0.20 2 0
11 squat 1 150 0.15 3 0
12 bench 1 80 0.30 0 0
13 bench 1 90 0.25 1 0
14 bench 1 100 0.20 2 0
15 bench 1 110 0.15 3 0
我认为这应该可以使用 groupby 和 cumcount 来实现,但我很难让它发挥作用。
解决方案
GroupBy.transform
与factorize
和一起使用GroupBy.cumcount
:
df['set1'] = (df.groupby(['exercise','session'])['weight']
.transform(lambda x: pd.factorize(x)[0]))
df['rep1'] = df.groupby(['exercise','session','weight']).cumcount()
print (df)
exercise session weight velocity set rep set1 rep1
0 squat 0 100 0.30 0 0 0 0
1 squat 0 100 0.25 0 1 0 1
2 squat 0 120 0.20 1 0 1 0
3 squat 0 120 0.15 1 1 1 1
4 bench 0 80 0.30 0 0 0 0
5 bench 0 80 0.25 0 1 0 1
6 bench 0 100 0.20 1 0 1 0
7 bench 0 110 0.15 1 1 2 0
8 squat 1 120 0.30 0 0 0 0
9 squat 1 130 0.25 1 0 1 0
10 squat 1 140 0.20 2 0 2 0
11 squat 1 150 0.15 3 0 3 0
12 bench 1 80 0.30 0 0 0 0
13 bench 1 90 0.25 1 0 1 0
14 bench 1 100 0.20 2 0 2 0
15 bench 1 110 0.15 3 0 3 0