pandas - 尝试使用熊猫查找黑白两个文件时出现键错误
问题描述
我有两个文件,sample1.csv 是一个大文件,而 sample2.csv 是一个子集。我只是想创建一个数据框,其中包含 sample1.csv 的所有列,用于 sample2.csv 中的“企业 ID”
import pandas as pd
import numpy as np
df_raw = pd.read_csv("sample1.csv",skip_blank_lines=True,error_bad_lines=False).fillna("")
df_res = pd.read_csv("sample2.csv").fillna("")
df_res_ent['enterpriseid'] = df_res["enterpriseid"]
df_f = df_raw[df_res_ent['enterpriseid']]
df_f
**Below is the error**
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-71-69c3a17d981e> in <module>
1 #df = pd.merge(df_res_ent,df_raw,on ='enterpriseid',how ='inner')
----> 2 df_f = df_raw[df_res_ent['enterpriseid']]
3 df_f
4
5 #output = df.to_csv("output.csv")
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
2804 if is_iterator(key):
2805 key = list(key)
-> 2806 indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
2807
2808 # take() does not accept boolean indexers
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1550 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1551
-> 1552 self._validate_read_indexer(
1553 keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
1554 )
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1638 if missing == len(indexer):
1639 axis_name = self.obj._get_axis_name(axis)
-> 1640 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1641
1642 # We (temporarily) allow for some missing keys with .loc, except in
KeyError: "None of [Index(['AMPR.2Y2YI0', 'AMPR.CQW0UW', 'AMPR.ICWWWE', 'AMPR.KGGAYG',\n 'AMPR.OK0MGQ', 'AMPR.OWYUS2', 'AMPR.2GQSOQ', 'AMPR.04WOYW',\n 'AMPR.W00MWS', 'AMPR.QQKUIU',\n ...\n '', '', '', '', '', '', '', '', '', ''],\n dtype='object', length=14844)] are in the [columns]"
解决方案
Ok... so I figured an alternative to get me going but would still love to see what I was doing wrong in above or if there are better alternatives! thanks and happy coding!
df_m = pd.merge(df_res,df_raw,on ='enterpriseid',how ='inner')
df_f = df_m[df_m['enterpriseid'] != ""]
df_f.iloc[:,2:]
推荐阅读
- node.js - 从 cron/shellJS 重新启动 pm2 - bcrypt 版本错误
- python - subprocess.check_ouput 吐出错误第 356 行和第 438 行
- db2 - db2iupdt 错误:指定的实例“instance_name”不存在
- mongodb - Mongo $meta textScore 未在查询结果中返回
- html - 为什么我的固定元素会移到整个屏幕的右侧而不是其父元素的右侧?
- apache - 拦截主机服务器上的 HTTP 请求和响应
- android - Jenkins 有时会在 LEGACY 资源模式下运行 robolecrtic 测试
- go - SublimeText 中的 Gofmt 插件未找到 GOPATH 错误
- python - 指定大小的随机子集的 Numpy 列表
- c++ - 如何安全地将变量的值传递给 C++ 方法,该方法仅将 void* 作为参数?