python - Pandas 按条件分组和变换并应用于整列
问题描述
我有以下数据框:
import pandas as pd
df = pd.DataFrame({'Value': [0, 1, 2,3, 4,5,6,7,8,9],'Name': ['John', 'John', 'John','John', 'John','John','John','John','John','John']
,'City': ['A', 'B', 'A','B', 'A','B','B','A','B','A'],'City2': ['C', 'D', 'C','D', 'C','D','D','C','D','C']})
df
Value Name City City2
0 0 John A C
1 1 John B D
2 2 John A C
3 3 John B D
4 4 John A C
5 5 John B D
6 6 John B D
7 7 John A C
8 8 John B D
9 9 John A C
我试图在City2
等于“C”时取平均值,但将其应用于整个新列:
我试过了:
df['C_Average'] = df[df['City2'] == 'C'].groupby(['Name','City'])['Value'].transform(lambda v: v.nsmallest(11).mean())
df
Value Name City City2 C_Average
0 0 John A C 4.4
1 1 John B D NaN
2 2 John A C 4.4
3 3 John B D NaN
4 4 John A C 4.4
5 5 John B D NaN
6 6 John B D NaN
7 7 John A C 4.4
8 8 John B D NaN
9 9 John A C 4.4
如您所见,添加了新列,但我想将其应用于整个列,而不仅仅是City2
等于 C 的行。即整个列显示 4.4。有任何想法吗?
谢谢!
解决方案
一种技巧是将不匹配的值替换为缺失值,而不是过滤:
print (df.assign(Value = df['Value'].where(df['City2']== 'C')))
Value Name City City2
0 0.0 John A C
1 NaN John B D
2 2.0 John A C
3 NaN John B D
4 4.0 John A C
5 NaN John B D
6 NaN John B D
7 7.0 John A C
8 NaN John B D
9 9.0 John A C
但是样本数据中的问题是没有组C
in groups John
, B
,所以得到相同的输出:
df['C_Average'] = (df.assign(Value = df['Value'].where(df['City2']== 'C'))
.groupby(['Name','City'])['Value']
.transform(lambda v: v.nsmallest(11).mean()))
print (df)
Value Name City City2 C_Average
0 0 John A C 4.4
1 1 John B D NaN
2 2 John A C 4.4
3 3 John B D NaN
4 4 John A C 4.4
5 5 John B D NaN
6 6 John B D NaN
7 7 John A C 4.4
8 8 John B D NaN
9 9 John A C 4.4
如果更改数据运行良好:
df = pd.DataFrame({'Value': [0, 1, 2,3, 4,5,6,7,8,9],'Name': ['John', 'John', 'John','John', 'John','John','John','John','John','John']
,'City': ['A', 'B', 'A','B', 'A','B','B','A','B','A'],'City2': ['C', 'C', 'C','D', 'C','D','D','C','D','C']})
print (df)
Value Name City City2
0 0 John A C
1 1 John B C <- one row for C for group John, B
2 2 John A C
3 3 John B D
4 4 John A C
5 5 John B D
6 6 John B D
7 7 John A C
8 8 John B D
9 9 John A C
df['C_Average'] = (df.assign(Value = df['Value'].where(df['City2']== 'C'))
.groupby(['Name','City'])['Value']
.transform(lambda v: v.nsmallest(11).mean()))
print (df)
Value Name City City2 C_Average
0 0 John A C 4.4
1 1 John B C 1.0
2 2 John A C 4.4
3 3 John B D 1.0
4 4 John A C 4.4
5 5 John B D 1.0
6 6 John B D 1.0
7 7 John A C 4.4
8 8 John B D 1.0
9 9 John A C 4.4
您的解决方案有所不同:
df['C_Average'] = df[df['City2'] == 'C'].groupby(['Name','City'])['Value'].transform(lambda v: v.nsmallest(11).mean())
print (df)
Value Name City City2 C_Average
0 0 John A C 4.4
1 1 John B C 1.0
2 2 John A C 4.4
3 3 John B D NaN
4 4 John A C 4.4
5 5 John B D NaN
6 6 John B D NaN
7 7 John A C 4.4
8 8 John B D NaN
9 9 John A C 4.4