首页 > 解决方案 > 使用 flow_from_dataframe 方法时出现“列表索引超出范围”错误

问题描述

我正在使用 python 和 tensorflow,并从 dataframe 方法(从 kaggle 查看开源代码)创建了一个流来生成训练和验证数据生成。我的问题是当我运行代码以使用相同的 flow_from_dataframe 方法创建 test_X、test_Y 集时。

最初,此代码已用作 kaggle 代码显示,但由于某种原因,它似乎对我不起作用。

我检查了许多 kaggle 内核,其中一些是:

https://www.kaggle.com/digitalchaos666/simple-vgg16/notebookhttps://www.kaggle.com/kmader/attention-on-pretrained-vgg16-for-bone-age

两者都有相同的代码来解决问题,但现在似乎没有运行。即使你分叉内核并按原样运行代码,而不做任何更改,它似乎在这一点上失败了

train_datagen = ImageDataGenerator(samplewise_center=False, 
                        samplewise_std_normalization=False, 
                          horizontal_flip = True, 
                          vertical_flip = False, 
                          height_shift_range = 0.2, 
                          width_shift_range = 0.2, 
                          rotation_range = 5, 
                          shear_range = 0.01,
                          fill_mode = 'nearest',
                          zoom_range=0.25,
                         preprocessing_function = preprocess_input)

train_gen = flow_from_dataframe(train_datagen, df_train, 
                        path_col = 'path',
                        y_col = 'bone_age_zscore', 
                        target_size = IMG_SIZE,
                        color_mode = 'rgb',
                        batch_size = 32)

valid_gen = flow_from_dataframe(train_datagen, df_valid, 
                        path_col = 'path',
                        y_col = 'bone_age_zscore', 
                        target_size = IMG_SIZE,
                        color_mode = 'rgb',
                        batch_size = 256)

# used a fixed dataset for evaluating the algorithm, issue lies here
test_X, test_Y = next(flow_from_dataframe(train_datagen, 
                        df_valid, 
                        path_col = 'path',
                        y_col = 'bone_age_zscore', 
                        target_size = IMG_SIZE,
                        color_mode = 'rgb',
                        batch_size = 512))


The error message is as follows: 
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-53-10b59388b3f6> in <module>
     36                             target_size = IMG_SIZE,
     37                             color_mode = 'rgb',
---> 38                             batch_size = 512)) # one big batch
     39 
     40 print('Complete')

/opt/conda/lib/python3.6/site-packages/keras_preprocessing/image/iterator.py in __next__(self, *args, **kwargs)
    102 
    103     def __next__(self, *args, **kwargs):
--> 104         return self.next(*args, **kwargs)
    105 
    106     def next(self):

/opt/conda/lib/python3.6/site-packages/keras_preprocessing/image/iterator.py in next(self)
    114         # The transformation of images is not under thread lock
    115         # so it can be done in parallel
--> 116         return self._get_batches_of_transformed_samples(index_array)
    117 
    118     def _get_batches_of_transformed_samples(self, index_array):

/opt/conda/lib/python3.6/site-packages/keras_preprocessing/image/iterator.py in _get_batches_of_transformed_samples(self, index_array)
    225         filepaths = self.filepaths
    226         for i, j in enumerate(index_array):
--> 227             img = load_img(filepaths[j],
    228                            color_mode=self.color_mode,
    229                            target_size=self.target_size,

IndexError: list index out of range

标签: pythontensorflow

解决方案


我通过为每个生成器创建单独的数据框解决了我的问题。train_flow 和 valid_flow 都使用了接受种子值的 flow_from_dataframe 方法。这允许我的训练和验证集在我运行代码时是相同的,这是我需要的首选项。

另一方面,我的 test_flows 没有接受种子值,所以我为它们创建了一个新方法。

train_idg = ImageDataGenerator(zoom_range=0.2,
                           fill_mode='nearest',
                           rotation_range=25,  
                           width_shift_range=0.25,  
                           height_shift_range=0.25,  
                           vertical_flip=False, 
                           horizontal_flip=True,
                           shear_range = 0.2,
                           samplewise_center=False, 
                           samplewise_std_normalization=False)

val_idg = ImageDataGenerator(width_shift_range=0.25, 
                         height_shift_range=0.25, 
                         horizontal_flip=True)

test_idg = ImageDataGenerator()

####
def flow_from_dataframe(imgDatGen, df, batch_size, seed, img_size):
    gc.collect()
    gen_img = imgDatGen.flow_from_dataframe(dataframe=df,
        x_col='path', y_col='boneage_zscore',
        batch_size=batch_size, seed=seed, shuffle=True, class_mode='other',
        target_size=img_size, color_mode='rgb',
        drop_duplicates=False)

    gen_gender = imgDatGen.flow_from_dataframe(dataframe=df,
        x_col='path', y_col='gender',
        batch_size=batch_size, seed=seed, shuffle=True, class_mode='other',
        target_size=img_size, color_mode='rgb',
        drop_duplicates=False)

    while True:
        X1i = gen_img.next()
        X2i = gen_gender.next()
        gc.collect()
        yield [X1i[0], X2i[1]], X1i[1]

####

train_flow = flow_from_dataframe(train_idg, train_df, BATCH_SIZE_TRAIN, SEED, IMG_SIZE)

valid_flow = flow_from_dataframe(val_idg, valid_df, BATCH_SIZE_VAL, SEED, IMG_SIZE)

####


def test_gen_2inputs(imgDatGen, df, batch_size, img_size):
    gc.collect()
    gen_img = imgDatGen.flow_from_dataframe(dataframe=df,
        x_col='path', y_col='boneage_zscore',
        batch_size=batch_size, shuffle=False, class_mode='other',
        target_size=img_size, color_mode='rgb',
        drop_duplicates=False)

    gen_gender = imgDatGen.flow_from_dataframe(dataframe=df,
        x_col='path', y_col='gender',
        batch_size=batch_size, shuffle=False, class_mode='other',
        target_size=img_size, color_mode='rgb',
        drop_duplicates=False)

    while True:
        X1i = gen_img.next()
        X2i = gen_gender.next()
        gc.collect()
        yield [X1i[0], X2i[1]], X1i[1]

test_flow = test_gen_2inputs(test_idg, test_df, 789, IMG_SIZE)
male_test_flow = test_gen_2inputs(test_idg, male_df, 789, IMG_SIZE)
female_test_flow = test_gen_2inputs(test_idg, female_df, 789, IMG_SIZE)

此后,下面的代码运行成功

train_X, train_Y = next(my_train_flow)
test_X, test_Y = next(test_flow)
male_test_X, male_test_Y = next(male_test_flow)
female_test_X, female_test_Y = next(female_test_flow)

推荐阅读