首页 > 解决方案 > 通过pytorch数据加载器枚举的pandas keyerror

问题描述

介绍

我正在尝试通过访问包含路径列表的熊猫数据框中的名称来加载图像。我已经实现了一个自定义数据集,我使用 pandas.read_csv() 从 csv 文件加载熊猫数据框

代码摘录

数据集类:

class aDataset(torch.utils.data.Dataset):
    def __init__(self, img_dir, csv_file, teacher_transform=None, student_transform=None):
        self.img_dir = img_dir
        self.image_list = pd.read_csv(csv_file, names=['path'], index_col=False)
        self.teacher_transform=teacher_transform
        self.student_transform=student_transform
    
    def __len__(self):
        return len(self.image_list.index)

    def __getitem__(self, idx):
        teach_image_path = os.path.join(self.img_dir, self.image_list[idx][2:])
        teach_image = read_image(teach_image_path)
        image_iden = "/".join(teach_image_path.split("/")[:-1])
        iden_list = random.sample(os.listdir(image_iden), 100)
        temp_iden_list = [os.path.join(image_iden, i) for i in iden_list]
        iden_list = temp_iden_list
        iden_list.append(teach_image_path)
        student_image = read_image(random.choice(iden_list))
        teach_image = self.teacher_transform(teach_image)
        student_image = self.student_transform(student_image)
        return teach_image, student_image

数据加载器代码:

train = aDataset(train_img_dir, train_csv_file, teacher_transform=teacher_transform, student_transform=student_transform)
trainloader = torch.utils.data.DataLoader(train, batch_size=BATCHSIZE, num_workers=NUM_WORKERS, shuffle=True)

在这里,NUM_WORKERS=0,BATCHSIZE=64

CSV 格式

csv 格式如下:

./train/n000156/0299_01.jpg
./train/n000156/0352_01.jpg
./train/n000156/0223_01.jpg
./train/n000156/0072_01.jpg
./train/n000156/0088_01.jpg
./train/n000156/0024_02.jpg
./train/n000156/0139_01.jpg
.
.
.

CSV 文件中有 523649 行。

错误

但是,我在开始培训后立即收到以下错误:

Traceback (most recent call last):
  File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 453054

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "verMini.py", line 331, in <module>
    for iter_num, (teacher_images, student_images) in tqdm(enumerate(trainloader)):
  File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/tqdm/std.py", line 1185, in __iter__
    for obj in iterable:
  File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "verMini.py", line 74, in __getitem__
    teach_image_path = os.path.join(self.img_dir, self.image_list[idx][2:])
  File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/starc52/miniconda3/envs/p3ver/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 453054

疑似错误

每次运行时,代码似乎都会因为随机键的错误而停止。pandas 与 enumerate 函数的集成存在问题。

任何方向/提示都会有所帮助。先感谢您!

解决方案

我将线路更改self.image_list[idx][2:]self.image_list.loc[idx, 'path']并开始工作。但是,我不知道为什么它以前不起作用。这个愚蠢的问题花了我一天的时间才弄清楚。

标签: pythonpython-3.xpandaspytorchtorchvision

解决方案


如果你用 替换它是否self.image_list[idx][2:]有效self.image_list['path'][idx]


推荐阅读