python - 通过pytorch数据加载器枚举的pandas keyerror
问题描述
介绍
我正在尝试通过访问包含路径列表的熊猫数据框中的名称来加载图像。我已经实现了一个自定义数据集,我使用 pandas.read_csv() 从 csv 文件加载熊猫数据框
代码摘录
数据集类:
class aDataset(torch.utils.data.Dataset):
def __init__(self, img_dir, csv_file, teacher_transform=None, student_transform=None):
self.img_dir = img_dir
self.image_list = pd.read_csv(csv_file, names=['path'], index_col=False)
self.teacher_transform=teacher_transform
self.student_transform=student_transform
def __len__(self):
return len(self.image_list.index)
def __getitem__(self, idx):
teach_image_path = os.path.join(self.img_dir, self.image_list[idx][2:])
teach_image = read_image(teach_image_path)
image_iden = "/".join(teach_image_path.split("/")[:-1])
iden_list = random.sample(os.listdir(image_iden), 100)
temp_iden_list = [os.path.join(image_iden, i) for i in iden_list]
iden_list = temp_iden_list
iden_list.append(teach_image_path)
student_image = read_image(random.choice(iden_list))
teach_image = self.teacher_transform(teach_image)
student_image = self.student_transform(student_image)
return teach_image, student_image
数据加载器代码:
train = aDataset(train_img_dir, train_csv_file, teacher_transform=teacher_transform, student_transform=student_transform)
trainloader = torch.utils.data.DataLoader(train, batch_size=BATCHSIZE, num_workers=NUM_WORKERS, shuffle=True)
在这里,NUM_WORKERS=0,BATCHSIZE=64
CSV 格式
csv 格式如下:
./train/n000156/0299_01.jpg
./train/n000156/0352_01.jpg
./train/n000156/0223_01.jpg
./train/n000156/0072_01.jpg
./train/n000156/0088_01.jpg
./train/n000156/0024_02.jpg
./train/n000156/0139_01.jpg
.
.
.
CSV 文件中有 523649 行。
错误
但是,我在开始培训后立即收到以下错误:
Traceback (most recent call last):
File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 453054
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "verMini.py", line 331, in <module>
for iter_num, (teacher_images, student_images) in tqdm(enumerate(trainloader)):
File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/tqdm/std.py", line 1185, in __iter__
for obj in iterable:
File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "verMini.py", line 74, in __getitem__
teach_image_path = os.path.join(self.img_dir, self.image_list[idx][2:])
File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/pandas/core/frame.py", line 3455, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 453054
疑似错误
每次运行时,代码似乎都会因为随机键的错误而停止。pandas 与 enumerate 函数的集成存在问题。
任何方向/提示都会有所帮助。先感谢您!
解决方案
我将线路更改self.image_list[idx][2:]
为self.image_list.loc[idx, 'path']
并开始工作。但是,我不知道为什么它以前不起作用。这个愚蠢的问题花了我一天的时间才弄清楚。
解决方案
如果你用 替换它是否self.image_list[idx][2:]
有效self.image_list['path'][idx]
?
推荐阅读
- python - Pipenv 始终无法锁定并产生大量错误输出
- r - connecting to clickhouse in R
- stored-procedures - Record Type in BigQuery Stored procedure
- azure-devops - trying to use set variable from predefined variables and use it in condition for stage in azure devops pipeline
- amazon-web-services - Handling load balancer
- azure-data-factory-2 - Create a ADF Dataset to load multiple csv files (same format) from the Blob
- c++ - Converting an inline-asm x87 fsqrt function from C++ to C for x86-64
- php - 解析错误?'语法错误,意外'FORMAT'(T_STRING'
- r - 一个闪亮的应用程序,将数字附加到一维向量
- python - 二维数组的 Numpy 删除