pandas - 合并在 Pandas 中的行为不符合预期
问题描述
我正在尝试为我的数据框 () 中的列子集计算 zscores,combo
然后在该数据框中为这些 zscores 创建新列。请注意,当 zscore 被 pd.concat'ed 时,生成的新列都是 NaN。这就是我需要帮助的问题。
我认为这可能与 concat 如何添加新列有关,因为没有唯一的键可以匹配。但是当我试图将电子邮件保留在 zcores 中间表中时,它并没有解决问题。所以它可能是别的东西。
zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.rename(lambda x:colrename(x, "zscore "), axis=1)
newcombo = pd.concat([combo, zscores], axis=1)
combo.iloc[4]:
email msilveira66@brandeis.edu
all pas 54.84
all partic 92.21
course 60.39
pa grade PA01 67.7
pa grade PA02 82
pa grade PA03 21
pa grade PA04 0
pa grade PA05 43
pa grade PA06 29
pa grade PA07 61
pa grade PA08 63
pa grade PA09 NaN
pa grade PA10 72
pa grade PA11 0
resub PA01 NaN
resub PA02 NaN
resub PA03 NaN
resub PA04 NaN
resub PA05 NaN
resub PA06 NaN
resub PA07 NaN
resub PA08 NaN
resub PA09 NaN
resub PA10 NaN
resub PA11 NaN
initial PA01 56
initial PA02 83.3333
initial PA03 30
initial PA04 0
initial PA05 61
initial PA06 42
initial PA07 80
initial PA08 90
initial PA09 NaN
initial PA10 97
initial PA11 0
resubmits 0
resub mean NaN
initial mean 53.9333
pa grade mean 43.87
Name: 4, dtype: object
zscores.iloc[4]:
zscore PA01 -0.562523
zscore PA02 -0.418858
zscore PA03 -1.722308
zscore PA04 -1.378762
zscore PA05 -2.291849
zscore PA06 -0.503729
zscore PA07 -0.343543
zscore PA08 -2.037249
zscore PA09 -0.064932
zscore PA10 -0.428859
zscore PA11 -0.735842
Name: 5, dtype: float64
newcombo:
email msilveira66@brandeis.edu
all pas 54.84
all partic 92.21
course 60.39
pa grade PA01 67.7
pa grade PA02 82
pa grade PA03 21
pa grade PA04 0
pa grade PA05 43
pa grade PA06 29
pa grade PA07 61
pa grade PA08 63
pa grade PA09 NaN
pa grade PA10 72
pa grade PA11 0
resub PA01 NaN
resub PA02 NaN
resub PA03 NaN
resub PA04 NaN
resub PA05 NaN
resub PA06 NaN
resub PA07 NaN
resub PA08 NaN
resub PA09 NaN
resub PA10 NaN
resub PA11 NaN
initial PA01 56
initial PA02 83.3333
initial PA03 30
initial PA04 0
initial PA05 61
initial PA06 42
initial PA07 80
initial PA08 90
initial PA09 NaN
initial PA10 97
initial PA11 0
resubmits 0
resub mean NaN
initial mean 53.9333
pa grade mean 43.87
zscore PA01 NaN
zscore PA02 NaN
zscore PA03 NaN
zscore PA04 NaN
zscore PA05 NaN
zscore PA06 NaN
zscore PA07 NaN
zscore PA08 NaN
zscore PA09 NaN
zscore PA10 NaN
zscore PA11 NaN
Name: 4, dtype: object
解决方案
这是预期的行为,因为dropna
用 s 过滤掉子集中的所有行NaN
,所以最后concat
只添加过滤的新行,另一个值被转换为NaN
s:
combos = pd.DataFrame({'A':list('abcdef'),
'B':[np.nan,5,4,5,5,4],
'C':[7,8,9,np.nan,2,3],
'D':[1,3,5,np.nan,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (combos)
A B C D E F
0 a NaN 7.0 1.0 5 a
1 b 5.0 8.0 3.0 3 a
2 c 4.0 9.0 5.0 6 a
3 d 5.0 NaN NaN 9 b
4 e 5.0 2.0 1.0 2 b
5 f 4.0 3.0 0.0 4 b
#sample function
def zscore(x):
return x * 100
pa_grade_cols = ['B','C','D']
zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.add_prefix('zsores_')
newcombo = pd.concat([combos, zscores], axis=1)
print (newcombo)
A B C D E F zsores_B zsores_C zsores_D
0 a NaN 7.0 1.0 5 a NaN NaN NaN
1 b 5.0 8.0 3.0 3 a 500.0 800.0 300.0
2 c 4.0 9.0 5.0 6 a 400.0 900.0 500.0
3 d 5.0 NaN NaN 9 b NaN NaN NaN
4 e 5.0 2.0 1.0 2 b 500.0 200.0 100.0
5 f 4.0 3.0 0.0 4 b 400.0 300.0 0.0
详情:
print (zscores)
zsores_B zsores_C zsores_D
1 500.0 800.0 300.0
2 400.0 900.0 500.0
4 500.0 200.0 100.0
5 400.0 300.0 0.0
推荐阅读
- ubuntu - 为什么 pip3 软件包没有安装在特定网络上?
- reactjs - I want my CRA PWA to fetch from cache only if there is no internet. Basically I want it to be a Network First approach
- vba - Looping through files and after 1st file it exits loop
- android - Flutter: Capture photo - Save to Gallery - Read photos from Gallery
- r - 如何使用 ggplot2 重新创建以下 Box and Whisker Plot?
- assembly - 创建引导签名时出现 DB 和未定义符号的 MSDOS 语法错误
- c# - 如何使用 ASP.NET C# Entity Framework 在 IIS 中创建目录/文件
- npm - 使用 React 的 NPM 问题
- dialogflow-es - GCP 中 Stackdriver 中的 DialogFlow 日志没有 json_payload
- python - Anaconda Python - 从 .py 创建 .exe 文件时遇到问题