首页 > 解决方案 > 尝试使用熊猫查找黑白两个文件时出现键错误

问题描述

我有两个文件,sample1.csv 是一个大文件,而 sample2.csv 是一个子集。我只是想创建一个数据框,其中包含 sample1.csv 的所有列,用于 sample2.csv 中的“企业 ID”

显示来自 sample1.csv 的数据

显示来自 sample2.csv 的数据

    import pandas as pd 
    import numpy as np

    df_raw = pd.read_csv("sample1.csv",skip_blank_lines=True,error_bad_lines=False).fillna("")

    df_res = pd.read_csv("sample2.csv").fillna("")
    df_res_ent['enterpriseid'] = df_res["enterpriseid"]
    df_f = df_raw[df_res_ent['enterpriseid']]
    df_f

**Below is the error**

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-71-69c3a17d981e> in <module>
      1 #df = pd.merge(df_res_ent,df_raw,on ='enterpriseid',how ='inner')
----> 2 df_f = df_raw[df_res_ent['enterpriseid']]
      3 df_f
      4 
      5 #output = df.to_csv("output.csv")

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2804             if is_iterator(key):
   2805                 key = list(key)
-> 2806             indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
   2807 
   2808         # take() does not accept boolean indexers

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1550             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1551 
-> 1552         self._validate_read_indexer(
   1553             keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
   1554         )

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1638             if missing == len(indexer):
   1639                 axis_name = self.obj._get_axis_name(axis)
-> 1640                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1641 
   1642             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Index(['AMPR.2Y2YI0', 'AMPR.CQW0UW', 'AMPR.ICWWWE', 'AMPR.KGGAYG',\n       'AMPR.OK0MGQ', 'AMPR.OWYUS2', 'AMPR.2GQSOQ', 'AMPR.04WOYW',\n       'AMPR.W00MWS', 'AMPR.QQKUIU',\n       ...\n       '', '', '', '', '', '', '', '', '', ''],\n      dtype='object', length=14844)] are in the [columns]"

标签: pandaslookup

解决方案


Ok... so I figured an alternative to get me going but would still love to see what I was doing wrong in above or if there are better alternatives! thanks and happy coding!

df_m = pd.merge(df_res,df_raw,on ='enterpriseid',how ='inner')
df_f = df_m[df_m['enterpriseid'] != ""]
df_f.iloc[:,2:]

推荐阅读