python - Python pandas沿两个条件变量添加序列
问题描述
在R中,可以使用 轻松地沿两个(甚至更多)条件变量添加序列ave()
,如下所示:
# create a dataframe
dat = data.frame(
FactorA = c(rep('a1', 10), rep('a2', 10)),
FactorB = c(rep('b1', 5), rep('b2', 5), rep('b1', 5), rep('b2', 5)),
DependentVar = rnorm(20)
)
# add ordering given combination of two factors
dat$Order <- ave(dat$DependentVar, dat$FactorA, dat$FactorB,
FUN=seq_along)
Python中的类似物是pandas
什么?
2020 年 6 月 22 日补充:
此外,如果您要通过“改组”它们来使 FactorA 和 FactorB 的级别交错,例如:
# a slightly "shuffled" dataframe
dat2 = data.frame(
FactorA = c(rep('a1', 6), rep('a2', 6),
rep('a1', 4), rep('a2', 4)),
FactorB = c(rep('b1', 3), rep('b2', 3), rep('b1', 3), rep('b2', 3),
rep('b1', 2), rep('b2', 2), rep('b1', 2), rep('b2', 2)),
DependentVar = rnorm(20)
)
ave()
将继续对它们进行排序:
dat2$Order <- ave(dat2$DependentVar, dat2$FactorA, dat2$FactorB,
FUN=seq_along)
dat2
FactorA FactorB DependentVar Order
1 a1 b1 1.3814360 1
2 a1 b1 1.0702582 2
3 a1 b1 -1.1974390 3
4 a1 b2 -1.1687711 1
5 a1 b2 -0.7584645 2
6 a1 b2 -0.5541912 3
7 a2 b1 -0.3083331 1
8 a2 b1 0.7707984 2
9 a2 b1 2.4709730 3
10 a2 b2 0.1768273 1
11 a2 b2 0.5687605 2
12 a2 b2 0.7360105 3
13 a1 b1 0.9253223 4
14 a1 b1 -0.3190011 5
15 a1 b2 -0.2657454 4
16 a1 b2 -0.1617810 5
17 a2 b1 0.9634501 4
18 a2 b1 -0.6749173 5
19 a2 b2 0.8138765 4
20 a2 b2 -1.1075720 5
Python可以(1)标记组合的“外观”,并且(2)重置排序,如下所示:
FactorA FactorB DependentVar Order OrderReset WhichAppearance
1 a1 b1 1.3814360 1 1 1
2 a1 b1 1.0702582 2 2 1
3 a1 b1 -1.1974390 3 3 1
4 a1 b2 -1.1687711 1 1 1
5 a1 b2 -0.7584645 2 2 1
6 a1 b2 -0.5541912 3 3 1
7 a2 b1 -0.3083331 1 1 1
8 a2 b1 0.7707984 2 2 1
9 a2 b1 2.4709730 3 3 1
10 a2 b2 0.1768273 1 1 1
11 a2 b2 0.5687605 2 2 1
12 a2 b2 0.7360105 3 3 1
13 a1 b1 0.9253223 4 1 2
14 a1 b1 -0.3190011 5 2 2
15 a1 b2 -0.2657454 4 1 2
16 a1 b2 -0.1617810 5 2 2
17 a2 b1 0.9634501 4 1 2
18 a2 b1 -0.6749173 5 2 2
19 a2 b2 0.8138765 4 1 2
20 a2 b2 -1.1075720 5 2 2
解决方案
在带有熊猫的 Python 中,您可以这样做:
df['Order'] = df_data.groupby(['FactorA', 'FactorB']).cumcount() + 1
MVCE:
import pandas as pd
from io import StringIO
dat_text = StringIO(""" FactorA FactorB DependentVar
1 a1 b1 -1.1435908
2 a1 b1 -0.5799404
3 a1 b1 0.0680380
4 a1 b1 0.1143230
5 a1 b1 0.7673287
6 a1 b2 1.4769585
7 a1 b2 -1.3399984
8 a1 b2 -0.4832071
9 a1 b2 -2.3764355
10 a1 b2 0.2668480
11 a2 b1 -0.7376859
12 a2 b1 -0.4141878
13 a2 b1 -0.5159797
14 a2 b1 -1.3888258
15 a2 b1 0.1497270
16 a2 b2 0.1803052
17 a2 b2 0.8547880
18 a2 b2 0.2372080
19 a2 b2 0.3139455
20 a2 b2 0.7266356""")
df_data = pd.read_csv(dat_text, sep='\s\s+', engine='python')
print(df_data)
输出:
FactorA FactorB DependentVar
1 a1 b1 -1.143591
2 a1 b1 -0.579940
3 a1 b1 0.068038
4 a1 b1 0.114323
5 a1 b1 0.767329
6 a1 b2 1.476958
7 a1 b2 -1.339998
8 a1 b2 -0.483207
9 a1 b2 -2.376435
10 a1 b2 0.266848
11 a2 b1 -0.737686
12 a2 b1 -0.414188
13 a2 b1 -0.515980
14 a2 b1 -1.388826
15 a2 b1 0.149727
16 a2 b2 0.180305
17 a2 b2 0.854788
18 a2 b2 0.237208
19 a2 b2 0.313945
20 a2 b2 0.726636
groupby
与 一起使用cumcount
:
df_data['Order'] = df_data.groupby(['FactorA', 'FactorB']).cumcount() + 1
print(df_data)
输出:
FactorA FactorB DependentVar Order
1 a1 b1 -1.143591 1
2 a1 b1 -0.579940 2
3 a1 b1 0.068038 3
4 a1 b1 0.114323 4
5 a1 b1 0.767329 5
6 a1 b2 1.476958 1
7 a1 b2 -1.339998 2
8 a1 b2 -0.483207 3
9 a1 b2 -2.376435 4
10 a1 b2 0.266848 5
11 a2 b1 -0.737686 1
12 a2 b1 -0.414188 2
13 a2 b1 -0.515980 3
14 a2 b1 -1.388826 4
15 a2 b1 0.149727 5
16 a2 b2 0.180305 1
17 a2 b2 0.854788 2
18 a2 b2 0.237208 3
19 a2 b2 0.313945 4
20 a2 b2 0.726636 5
更新以回答“2020 年 6 月 22 日添加”:
#Let's create a helper column to define new groups in order of appearance
df['newgroup'] = (df[['FactorA', 'FactorB']] != df[['FactorA', 'FactorB']].shift()).any(axis=1).cumsum()
#Use cumcount to count rows in groups
df['Order Reset'] = df.groupby('newgroup').cumcount() + 1
#Use factorize to count appearances of groups
df['Appearance'] = df.groupby(['FactorA', 'FactorB'])['newgroup'].transform(lambda x: x.factorize()[0]+1)
df
输出:
FactorA FactorB DependentVar Order newgroup Order Reset Appearance
1 a1 b1 1.381436 1 1 1 1
2 a1 b1 1.070258 2 1 2 1
3 a1 b1 -1.197439 3 1 3 1
4 a1 b2 -1.168771 1 2 1 1
5 a1 b2 -0.758465 2 2 2 1
6 a1 b2 -0.554191 3 2 3 1
7 a2 b1 -0.308333 1 3 1 1
8 a2 b1 0.770798 2 3 2 1
9 a2 b1 2.470973 3 3 3 1
10 a2 b2 0.176827 1 4 1 1
11 a2 b2 0.568761 2 4 2 1
12 a2 b2 0.736010 3 4 3 1
13 a1 b1 0.925322 4 5 1 2
14 a1 b1 -0.319001 5 5 2 2
15 a1 b2 -0.265745 4 6 1 2
16 a1 b2 -0.161781 5 6 2 2
17 a2 b1 0.963450 4 7 1 2
18 a2 b1 -0.674917 5 7 2 2
19 a2 b2 0.813877 4 8 1 2
20 a2 b2 -1.107572 5 8 2 2
推荐阅读
- java - 仅从 Java 中的 JsonNode 获取双精度类型
- elixir - 为什么我的测试中 create_order/2 不匹配?
- javascript - Phaser - 如何更改组的精灵图像?
- function - TypeError:signUp 不是函数 - React Native Redux Typo
- ios - tvO 上的侧键盘
- twilio - Twilio - Twilio 功能而不是 Webhook 的消息服务入站设置
- python - 使用win32com通过outlook和python下载附件
- vue.js - 如何在 vue 组件中使用另一个方法的变量?
- c# - 如何在 Visual Studio 中使用 MSBuild 恢复 nuget 私有源
- javascript - 我正在获取图像列表视图,而不是以网格格式显示