首页 > 解决方案 > 在 Pandas 数据中删除标签时出现 KeyError

问题描述

我将 CSV 中的数据集加载到数据框中。我想显示列之间的最高相关性(前 10 个负值和前 10 个正值)

我在这个网站上发现了一个我认为对我有帮助的代码——

def get_redundant_pairs(df):
    '''Get diagonal and lower triangular pairs of correlation matrix'''
    pairs_to_drop = set()
    cols = df.columns
    for i in range(0, df.shape[1]):
        for j in range(0, i+1):
             pairs_to_drop.add((cols[i], cols[j]))
    return pairs_to_drop


def get_top_abs_correlations(df, n=5):
    au_corr = df.corr().abs().unstack()
    labels_to_drop = get_redundant_pairs(df)
    au_corr = au_corr.drop(labels=labels_to_drop).sort_values(ascending=False)
    return au_corr[0:n]

我从我的 DataFrame 调用这个函数 -

train = pd.read_csv('/content/drive/My Drive/DSF_HW3_Datasets/train.csv')
get_top_abs_correlations(train.loc[:, train.columns != 'Id'],10)

我得到一个 KeyError 值 -

KeyError: 'Foundation'

During handling of the above exception, another exception occurred:
....
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/multi.py in get_loc(self, key, method)
   2404 
   2405         if keylen == self.nlevels and self.is_unique: 
-> 2406             return self._engine.get_loc(key)
   2407 
   2408         # -- partial selection or non-unique index

 pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc()

 KeyError: ('Foundation', 'OverallQual')

我该如何解决这个错误?Train.csv 文件 - https://pastebin.com/vTh6md5W

标签: pythonpandas

解决方案


你想要掩码/最大:

# get the correlation matrix
corr = df.corr()

# mask away the lower triangle and diagonal
mask = np.triu(np.ones_like(corr),1) == 1

# get the upper triangle (excluding diagonal) by masking and stack:
corr = corr.where(mask).stack()

# 10 largest by absolute values
max10 = corr.abs().nlargest(10)

输出(最大10):

GarageCars    GarageArea      0.882475
YearBuilt     GarageYrBlt     0.825667
GrLivArea     TotRmsAbvGrd    0.825489
TotalBsmtSF   1stFlrSF        0.819530
OverallQual   SalePrice       0.790982
GrLivArea     SalePrice       0.708624
2ndFlrSF      GrLivArea       0.687501
BedroomAbvGr  TotRmsAbvGrd    0.676620
BsmtFinSF1    BsmtFullBath    0.649212
YearRemodAdd  GarageYrBlt     0.642277
dtype: float64

要获得原始(有符号)相关性:

corr.loc[max10.index]

巧合的是,这与绝对最大值相同。


推荐阅读