首页 > 解决方案 > 我正在尝试用平均值替换 NaN 值

问题描述

我必须用 jupyter notebook 中的相应方法替换 s_months 和 events NaN 值。
输入数据 :

    Types   c_years     o_periods   s_months    incidents
0   1       1           1           127.0       0.0
1   1       1           2           63.0        0.0
2   1       2           1           1095.0      3.0
3   1       2           2           1095.0      4.0
4   1       3           1           1512.0      6.0
5   1       3           2           3353.0      18.0
6   1       4           1           NaN         NaN
7   1       4           2           2244.0      11.0
14  2       4           1           NaN         NaN

我已经尝试了下面的代码,但它似乎不起作用,我尝试了不同的变体,例如替换转换。

df.fillna['s_months'] = df.fillna(df.grouby(['types' , 'o_periods']['s_months','incidents']).tranform('mean'),inplace = True)
                 s_months  incidents
Types o_periods                     
1     1               911          3
      2              1688          8
2     1             26851         36
      2             14440         36
3     1               914          2
      2               862          1
4     1               296          0
      2               889          3
5     1               663          4
      2              1046          6

标签: pythonpandasjupyter

解决方案


从你的DataFrame

>>> import pandas as pd
>>> from io import StringIO

>>> df = pd.read_csv(StringIO("""
Types,c_years,o_periods,s_months,incidents
0,1,1,1,127.0,0.0
1,1,1,2,63.0,0.0
2,1,2,1,1095.0,3.0
3,1,2,2,1095.0,4.0
4,1,3,1,1512.0,6.0
5,1,3,2,3353.0,18.0
6,1,4,1,NaN,NaN
7,1,4,2,2244.0,11.0
14,2,4,1,NaN,NaN"""), sep=',')
>>> df
    Types   c_years     o_periods   s_months    incidents
0   1       1           1           127.0       0.0
1   1       1           2           63.0        0.0
2   1       2           1           1095.0      3.0
3   1       2           2           1095.0      4.0
4   1       3           1           1512.0      6.0
5   1       3           2           3353.0      18.0
6   1       4           1           NaN         NaN
7   1       4           2           2244.0      11.0
14  2       4           1           NaN         NaN
>>> df[['c_years', 's_months', 'incidents']] = df.groupby(['Types', 'o_periods']).transform(lambda x: x.fillna(x.mean()))
>>> df
    Types   c_years     o_periods   s_months    incidents
0   1             1     1           127.000000      0.0
1   1             1     2           63.000000       0.0
2   1             2     1           1095.000000     3.0
3   1             2     2           1095.000000     4.0
4   1             3     1           1512.000000     6.0
5   1             3     2           3353.000000     18.0
6   1             4     1           911.333333      3.0
7   1             4     2           2244.000000     11.0
14  2             4     1           NaN             NaN

最后一个NaN在这里是因为它属于最后一个在列中不包含任何值的组,s_months因此incidents不包含mean.


推荐阅读