python - 使用 flow_from_dataframe 方法时出现“列表索引超出范围”错误
问题描述
我正在使用 python 和 tensorflow,并从 dataframe 方法(从 kaggle 查看开源代码)创建了一个流来生成训练和验证数据生成。我的问题是当我运行代码以使用相同的 flow_from_dataframe 方法创建 test_X、test_Y 集时。
最初,此代码已用作 kaggle 代码显示,但由于某种原因,它似乎对我不起作用。
我检查了许多 kaggle 内核,其中一些是:
https://www.kaggle.com/digitalchaos666/simple-vgg16/notebook和 https://www.kaggle.com/kmader/attention-on-pretrained-vgg16-for-bone-age
两者都有相同的代码来解决问题,但现在似乎没有运行。即使你分叉内核并按原样运行代码,而不做任何更改,它似乎在这一点上失败了
train_datagen = ImageDataGenerator(samplewise_center=False,
samplewise_std_normalization=False,
horizontal_flip = True,
vertical_flip = False,
height_shift_range = 0.2,
width_shift_range = 0.2,
rotation_range = 5,
shear_range = 0.01,
fill_mode = 'nearest',
zoom_range=0.25,
preprocessing_function = preprocess_input)
train_gen = flow_from_dataframe(train_datagen, df_train,
path_col = 'path',
y_col = 'bone_age_zscore',
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 32)
valid_gen = flow_from_dataframe(train_datagen, df_valid,
path_col = 'path',
y_col = 'bone_age_zscore',
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 256)
# used a fixed dataset for evaluating the algorithm, issue lies here
test_X, test_Y = next(flow_from_dataframe(train_datagen,
df_valid,
path_col = 'path',
y_col = 'bone_age_zscore',
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 512))
The error message is as follows:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-53-10b59388b3f6> in <module>
36 target_size = IMG_SIZE,
37 color_mode = 'rgb',
---> 38 batch_size = 512)) # one big batch
39
40 print('Complete')
/opt/conda/lib/python3.6/site-packages/keras_preprocessing/image/iterator.py in __next__(self, *args, **kwargs)
102
103 def __next__(self, *args, **kwargs):
--> 104 return self.next(*args, **kwargs)
105
106 def next(self):
/opt/conda/lib/python3.6/site-packages/keras_preprocessing/image/iterator.py in next(self)
114 # The transformation of images is not under thread lock
115 # so it can be done in parallel
--> 116 return self._get_batches_of_transformed_samples(index_array)
117
118 def _get_batches_of_transformed_samples(self, index_array):
/opt/conda/lib/python3.6/site-packages/keras_preprocessing/image/iterator.py in _get_batches_of_transformed_samples(self, index_array)
225 filepaths = self.filepaths
226 for i, j in enumerate(index_array):
--> 227 img = load_img(filepaths[j],
228 color_mode=self.color_mode,
229 target_size=self.target_size,
IndexError: list index out of range
解决方案
我通过为每个生成器创建单独的数据框解决了我的问题。train_flow 和 valid_flow 都使用了接受种子值的 flow_from_dataframe 方法。这允许我的训练和验证集在我运行代码时是相同的,这是我需要的首选项。
另一方面,我的 test_flows 没有接受种子值,所以我为它们创建了一个新方法。
train_idg = ImageDataGenerator(zoom_range=0.2,
fill_mode='nearest',
rotation_range=25,
width_shift_range=0.25,
height_shift_range=0.25,
vertical_flip=False,
horizontal_flip=True,
shear_range = 0.2,
samplewise_center=False,
samplewise_std_normalization=False)
val_idg = ImageDataGenerator(width_shift_range=0.25,
height_shift_range=0.25,
horizontal_flip=True)
test_idg = ImageDataGenerator()
####
def flow_from_dataframe(imgDatGen, df, batch_size, seed, img_size):
gc.collect()
gen_img = imgDatGen.flow_from_dataframe(dataframe=df,
x_col='path', y_col='boneage_zscore',
batch_size=batch_size, seed=seed, shuffle=True, class_mode='other',
target_size=img_size, color_mode='rgb',
drop_duplicates=False)
gen_gender = imgDatGen.flow_from_dataframe(dataframe=df,
x_col='path', y_col='gender',
batch_size=batch_size, seed=seed, shuffle=True, class_mode='other',
target_size=img_size, color_mode='rgb',
drop_duplicates=False)
while True:
X1i = gen_img.next()
X2i = gen_gender.next()
gc.collect()
yield [X1i[0], X2i[1]], X1i[1]
####
train_flow = flow_from_dataframe(train_idg, train_df, BATCH_SIZE_TRAIN, SEED, IMG_SIZE)
valid_flow = flow_from_dataframe(val_idg, valid_df, BATCH_SIZE_VAL, SEED, IMG_SIZE)
####
def test_gen_2inputs(imgDatGen, df, batch_size, img_size):
gc.collect()
gen_img = imgDatGen.flow_from_dataframe(dataframe=df,
x_col='path', y_col='boneage_zscore',
batch_size=batch_size, shuffle=False, class_mode='other',
target_size=img_size, color_mode='rgb',
drop_duplicates=False)
gen_gender = imgDatGen.flow_from_dataframe(dataframe=df,
x_col='path', y_col='gender',
batch_size=batch_size, shuffle=False, class_mode='other',
target_size=img_size, color_mode='rgb',
drop_duplicates=False)
while True:
X1i = gen_img.next()
X2i = gen_gender.next()
gc.collect()
yield [X1i[0], X2i[1]], X1i[1]
test_flow = test_gen_2inputs(test_idg, test_df, 789, IMG_SIZE)
male_test_flow = test_gen_2inputs(test_idg, male_df, 789, IMG_SIZE)
female_test_flow = test_gen_2inputs(test_idg, female_df, 789, IMG_SIZE)
此后,下面的代码运行成功
train_X, train_Y = next(my_train_flow)
test_X, test_Y = next(test_flow)
male_test_X, male_test_Y = next(male_test_flow)
female_test_X, female_test_Y = next(female_test_flow)
推荐阅读
- activemq - 使用 Spring、JMS 和 ActiveMQ 进行消费者驱动的合同测试
- angular - 确保在注入提供程序之前初始化静态变量
- javascript - 在 node.js 中调用一个有 promise 的函数
- android - 改造 Gson 序列化在三星 S9 设备上无法正常工作
- python - 从字典关键字创建变量
- r - 从 UTF-8 字符串中删除不可见字符
- firebase - Ionic 3 Firebase Google Auth 重定向到本地主机而不是应用程序
- azure-pipelines - 如何强行使我的天蓝色管道 CI/CD 失败
- flutter - Flutter 网络图像作为谷歌地图标记
- python - 在情节子图中更改单轴