首页 > 解决方案 > 合并在 Pandas 中的行为不符合预期

问题描述

我正在尝试为我的数据框 () 中的列子集计算 zscores,combo然后在该数据框中为这些 zscores 创建新列。请注意,当 zscore 被 pd.concat'ed 时,生成的新列都是 NaN。这就是我需要帮助的问题。

我认为这可能与 concat 如何添加新列有关,因为没有唯一的键可以匹配。但是当我试图将电子邮件保留在 zcores 中间表中时,它并没有解决问题。所以它可能是别的东西。

zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.rename(lambda x:colrename(x, "zscore "), axis=1)
newcombo = pd.concat([combo, zscores], axis=1)

combo.iloc[4]: 

email            msilveira66@brandeis.edu
all pas                             54.84
all partic                          92.21
course                              60.39
pa grade PA01                        67.7
pa grade PA02                          82
pa grade PA03                          21
pa grade PA04                           0
pa grade PA05                          43
pa grade PA06                          29
pa grade PA07                          61
pa grade PA08                          63
pa grade PA09                         NaN
pa grade PA10                          72
pa grade PA11                           0
resub PA01                            NaN
resub PA02                            NaN
resub PA03                            NaN
resub PA04                            NaN
resub PA05                            NaN
resub PA06                            NaN
resub PA07                            NaN
resub PA08                            NaN
resub PA09                            NaN
resub PA10                            NaN
resub PA11                            NaN
initial PA01                           56
initial PA02                      83.3333
initial PA03                           30
initial PA04                            0
initial PA05                           61
initial PA06                           42
initial PA07                           80
initial PA08                           90
initial PA09                          NaN
initial PA10                           97
initial PA11                            0
resubmits                               0
resub mean                            NaN
initial mean                      53.9333
pa grade mean                       43.87
Name: 4, dtype: object

zscores.iloc[4]:

zscore PA01   -0.562523
zscore PA02   -0.418858
zscore PA03   -1.722308
zscore PA04   -1.378762
zscore PA05   -2.291849
zscore PA06   -0.503729
zscore PA07   -0.343543
zscore PA08   -2.037249
zscore PA09   -0.064932
zscore PA10   -0.428859
zscore PA11   -0.735842
Name: 5, dtype: float64

newcombo:

email            msilveira66@brandeis.edu
all pas                             54.84
all partic                          92.21
course                              60.39
pa grade PA01                        67.7
pa grade PA02                          82
pa grade PA03                          21
pa grade PA04                           0
pa grade PA05                          43
pa grade PA06                          29
pa grade PA07                          61
pa grade PA08                          63
pa grade PA09                         NaN
pa grade PA10                          72
pa grade PA11                           0
resub PA01                            NaN
resub PA02                            NaN
resub PA03                            NaN
resub PA04                            NaN
resub PA05                            NaN
resub PA06                            NaN
resub PA07                            NaN
resub PA08                            NaN
resub PA09                            NaN
resub PA10                            NaN
resub PA11                            NaN
initial PA01                           56
initial PA02                      83.3333
initial PA03                           30
initial PA04                            0
initial PA05                           61
initial PA06                           42
initial PA07                           80
initial PA08                           90
initial PA09                          NaN
initial PA10                           97
initial PA11                            0
resubmits                               0
resub mean                            NaN
initial mean                      53.9333
pa grade mean                       43.87
zscore PA01                           NaN
zscore PA02                           NaN
zscore PA03                           NaN
zscore PA04                           NaN
zscore PA05                           NaN
zscore PA06                           NaN
zscore PA07                           NaN
zscore PA08                           NaN
zscore PA09                           NaN
zscore PA10                           NaN
zscore PA11                           NaN
Name: 4, dtype: object

标签: pandas

解决方案


这是预期的行为,因为dropna用 s 过滤掉子集中的所有行NaN,所以最后concat只添加过滤的新行,另一个值被转换为NaNs:

combos = pd.DataFrame({'A':list('abcdef'),
                   'B':[np.nan,5,4,5,5,4],
                   'C':[7,8,9,np.nan,2,3],
                   'D':[1,3,5,np.nan,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (combos)
   A    B    C    D  E  F
0  a  NaN  7.0  1.0  5  a
1  b  5.0  8.0  3.0  3  a
2  c  4.0  9.0  5.0  6  a
3  d  5.0  NaN  NaN  9  b
4  e  5.0  2.0  1.0  2  b
5  f  4.0  3.0  0.0  4  b

#sample function
def zscore(x):
    return x * 100

pa_grade_cols = ['B','C','D']
zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.add_prefix('zsores_')
newcombo = pd.concat([combos, zscores], axis=1)
print (newcombo)
   A    B    C    D  E  F  zsores_B  zsores_C  zsores_D
0  a  NaN  7.0  1.0  5  a       NaN       NaN       NaN
1  b  5.0  8.0  3.0  3  a     500.0     800.0     300.0
2  c  4.0  9.0  5.0  6  a     400.0     900.0     500.0
3  d  5.0  NaN  NaN  9  b       NaN       NaN       NaN
4  e  5.0  2.0  1.0  2  b     500.0     200.0     100.0
5  f  4.0  3.0  0.0  4  b     400.0     300.0       0.0

详情

print (zscores)
   zsores_B  zsores_C  zsores_D
1     500.0     800.0     300.0
2     400.0     900.0     500.0
4     500.0     200.0     100.0
5     400.0     300.0       0.0

推荐阅读