首页 > 解决方案 > 熊猫用 grouby 值填充空值

问题描述

我正在尝试为数据框中的所有数字类型列填充空值。

下面的代码遍历每个数字列并按分类特征分组,并计算目标列的中位数。

然后,我们创建一个新列,如果存在则复制值,但如果它为空,则它应该根据存在 n/a 的行中的分类值从 groupby 复制值。

# fill in numeric nulls with median based on job
for i in dfint:
    print(i)

for i in dfint:
    if i in ["TARGET_BAD_FLAG", "TARGET_LOSS_AMT"]: continue
    print(i)
    group=df.groupby("JOB")[i].median()
    print(group)
    df["IMP_"+i]=df[i].fillna(group[group.index.get_loc(df.loc[df[i].isna(),"JOB"])])
    #the line below works but fills in all nulls with the median for the "Mgr" job category, the code above should find the job category for the null record and pull the groupby value 
    #df["IMP_"+i]=df[i].fillna(group[group.index.get_loc("Mgr")])

我似乎对 .get_loc 之间的函数有问题,这是输出

TARGET_BAD_FLAG
TARGET_LOSS_AMT
LOAN
MORTDUE
VALUE
YOJ
DEROG
DELINQ
CLAGE
NINQ
CLNO
DEBTINC
LOAN
JOB
Mgr        18100
Office     16200
Other      15200
ProfExe    17300
Sales      14300
Self       24000
Name: LOAN, dtype: int64
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-207-f8a76179c818> in <module>
      8     group=df.groupby("JOB")[i].median()
      9     print(group)
---> 10     df["IMP_"+i]=df[i].fillna(group[group.index.get_loc(df.loc[df[i].isna(),"JOB"])])
     11     #the line below works but fills in all nulls with the median for the "Mgr" job category, the code above should find the job category for the null record and pull the groupby value
     12     #df["IMP_"+i]=df[i].fillna(group[group.index.get_loc("Mgr")])

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2895                 )
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:
   2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

TypeError: 'Series([], Name: JOB, dtype: object)' is an invalid key

有没有办法修改该行以按预期进行

标签: pythonpandas

解决方案


你写了这个:df.loc[df[i].isna(),"JOB"]它将返回一个熊猫系列,而不是pandas.Index.get_loc要求的键


推荐阅读