首页 > 解决方案 > 不能将 R keras fit_generator() 与自定义数据生成器一起使用

问题描述

这是我在这里的第一篇文章,因此非常欢迎任何有关问题描述的帮助和/或建议。

话虽如此,让我们来看看我几个小时以来一直面临的问题:

为了训练 vgg16 模型,我使用自定义 R 数据生成器来预处理来自keras:flow_from_directory. 尽管我的笔记本电脑上没有足够的 CPU 处理能力,但我能够在减少batch_sizeepochssteps_per_epoch. 可以在这里找到一个更简单的代码来重现我的成功: Using a custom R generator function with fit_generator (Keras, R)

但是,一旦我使用配备 GPU 的计算机并尝试将此fit_generator函数与此自定义 R 生成器一起使用,问题就开始了。我只是停留在第一个时代的第一步,没有任何来自 R 控制台的响应。这发生在我的模型和上面列出的示例模型上。这是我得到的:

> library(keras)
> # example data
> data <- data.frame(
+   x = runif(80),
+   y = runif(80),
+   z = runif(80)
+ )
> # example generator
> data_generator <- function(data, x, y, batch_size) {
+   
+   # start iterator
+   i <- 1
+   
+   # return an iterator function
+   function() {
+
+     # reset iterator if already seen all data
+     if ((i + batch_size - 1) > nrow(data)) i <<- 1
+ 
+     # iterate current batch's rows
+     rows <- c(i:min(i + batch_size - 1, nrow(data)))
+     
+     # update to next iteration
+     i <<- i + batch_size
+     
+     # create container arrays
+     x_array <- array(0, dim = c(length(rows), length(x)))
+     y_array <- array(0, dim = c(length(rows), length(y)))
+     
+     # fill the container
+     x_array[1:length(rows), ] <- data[rows, x]
+     y_array[1:length(rows), ] <- data[rows, y]
+     
+     # return the batch
+     list(x_array, y_array)
+     
+   }
+   
+ }
> # set-up a generator
> gen <- data_generator(
+   data = data.matrix(data),
+   x = 1:2, # it is flexible, you can use the column numbers,
+   y = c("y", "z"), # or the column name
+   batch_size = 32
+ )
> # set up a simple keras model
> model <- keras_model_sequential() %>% 
+   layer_dense(32, input_shape = c(2)) %>% 
+   layer_dense(2)
2020-07-16 19:37:32.040393: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-16 19:37:35.098731: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-07-16 19:37:35.116724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 970 computeCapability: 5.2
coreClock: 1.266GHz coreCount: 13 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 208.91GiB/s
2020-07-16 19:37:35.117090: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-16 19:37:35.124119: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-16 19:37:35.129961: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-16 19:37:35.132197: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-16 19:37:35.137769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-16 19:37:35.141461: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-16 19:37:35.153316: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-16 19:37:35.153789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-16 19:37:35.154322: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-07-16 19:37:35.164029: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x24a290c5750 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-16 19:37:35.164373: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-16 19:37:35.165056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 970 computeCapability: 5.2
coreClock: 1.266GHz coreCount: 13 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 208.91GiB/s
2020-07-16 19:37:35.165386: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-16 19:37:35.165896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-16 19:37:35.166327: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-16 19:37:35.166681: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-16 19:37:35.166922: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-16 19:37:35.167133: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-16 19:37:35.167332: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-16 19:37:35.167569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-16 19:37:35.680951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-16 19:37:35.681298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-07-16 19:37:35.681438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-07-16 19:37:35.681704: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2991 MB memory) -> physical GPU (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0, compute capability: 5.2)
2020-07-16 19:37:35.684551: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x24a4bef6320 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-16 19:37:35.684840: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 970, Compute Capability 5.2
> model %>% compile(
+   optimizer = "rmsprop",
+   loss = "mse"
+ )
> # fit using generator
> model %>% fit_generator(
+   generator = gen,
+   steps_per_epoch = 100, # will auto-reset after see all sample
+   epochs = 10,
+   max_queue_size = 50
+   
+ )
2020-07-16 19:37:48.296325: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
Epoch 1/10
  1/100 [..............................] - ETA: 0s - loss: 0.4254

正如我之前所说,在我的笔记本电脑上安装了 Keras 的 CPU 运行相同的示例代码时,它运行平稳。

有没有人遇到过类似的问题,或者至少知道是什么原因造成的?如果需要,我很高兴提供更多信息,以便更好地澄清我的问题。

提前致谢!

附加信息:当我尝试使用标准生成器(例如我直接从中获得的生成器)时,我通常可以使用 Tensorflow 的 GPU 安装来训练模型flow_from_directory

标签: rtensorflowmachine-learningkeras

解决方案


推荐阅读