首页 > 解决方案 > 用平均值替换特定列

问题描述

我正在尝试用游戏平台和类型的平均 user_score 替换 user_score。这是我的代码:

dft = new_df.query('user_score != "tbd" & user_score.isnull()')
df_typical_user_ratio_by_platform = dft.groupby(['platform', 'genre'])['user_score'].apply(lambda x: x.sample(1).iloc[0])

def correct_user_score(row):
    platform = row['platform']
    genre = row['genre']
    if (row['user_score'] == 'tbd' or pd.isnull(row['user_score']) or row['user_score']=='nan'):
        u = df_typical_user_ratio_by_platform.loc[[platform, genre]].head(1).astype('float')
        uScore = ", ".join(map(str, u)) 
    else:
        uScore = row['user_score']
        
    return uScore

row = pd.Series(data=row_values, index=['user_score', 'platform', 'genre'])
correct_user_score(row)
new_df['user_score'] = new_df.apply(correct_user_score, axis=1)
new_df.sample(40)
# df['user_score'] = df['user_score'].astype('int')

这就是结果。user_score 当前是一个对象。我不确定如何替换 nan。我试着做 if u = 'nan',但这没有用。有什么建议吗?

https://imgur.com/WEDUdOh

标签: pythonpandas

解决方案


  • 将无效值强制为NaNto_numerice()
  • fillna()用你想要的计算
s = 20
df = pd.DataFrame({"userid":np.random.randint(1,5,s),
             "platform":np.random.choice(["windows","macos","ios","android"],s),
             "userscore":np.random.randint(1,10,s)})

# let's splat some scores...
df = df.assign(userscore=np.select([(df.userscore==7)&(df.index<10),(df.userscore==6)&(df.index<10)],["tbd",np.nan],df.userscore))

df["bad"] = df.userscore
df = df.assign(userscore=pd.to_numeric(df.userscore, errors="coerce"))
df.userscore = df.userscore.fillna(df.groupby(["userid","platform"])["userscore"].transform("mean"))

输出

用户身份 平台 用户评分 坏的
0 3 IOS 8 8
1 3 IOS 5 5
2 1 苹果系统 4.5 待定
3 2 苹果系统 3 3
4 2 安卓 3 3
5 2 IOS 4 4
6 1 苹果系统 5 5
7 4 安卓 8
8 1 苹果系统 4 4
9 2 视窗 2 2
10 2 安卓 1 1
11 4 视窗 5 5
12 3 安卓 2 2
13 2 视窗 9 9
14 3 安卓 8 8
15 2 视窗 1 1
16 4 视窗 8 8
17 2 视窗 4 4
18 2 IOS 3 3
19 4 安卓 8 8

推荐阅读