python - Tensorflow Image Classification Tutorial, crash on the second epoch
问题描述
I'm trying to replicate the TensorFlow Image Classification Tutorial found here https://www.tensorflow.org/tutorials/images/classification
However, for some reason, the program crashes with exit code -1073740791 (0xC0000409), after the first epoch.
Here's the code that I have copied:
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
import pathlib
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
image_count = len(list(data_dir.glob('*/*.jpg')))
print(image_count)
roses = list(data_dir.glob('roses/*'))
tulips = list(data_dir.glob('tulips/*'))
batch_size = 32
img_height = 180
img_width = 180
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,#originally 0.2
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
class_names = train_ds.class_names
#Configure the DataSet for performance
AUTOTUNE = tf.data.AUTOTUNE
#.cache = keeps the img in memory after they have been loaded off disk during the first epoch
#.prefetch = overlaps data preprocessing and model execution while training
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
#standardizing the RGB values to be within [0,1] range
normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)
normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixels values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image))
num_classes = 5
model = Sequential([
layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'), #Relu is Mutually exlusive, the initial classification
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'), #
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.summary()
epochs=10
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
and here's the output:
2021-03-10 11:33:28.285891: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
3670
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
2021-03-10 11:33:40.172866: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-10 11:33:40.182801: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-03-10 11:33:40.600845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 0.8605GHz coreCount: 4 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 37.33GiB/s
2021-03-10 11:33:40.602057: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-03-10 11:33:40.680462: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-03-10 11:33:40.680774: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-03-10 11:33:40.725064: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-03-10 11:33:40.762494: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-03-10 11:33:40.826237: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-03-10 11:33:40.879988: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-03-10 11:33:40.886178: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-03-10 11:33:41.052224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-03-10 11:33:41.065505: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-03-10 11:33:41.067812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 0.8605GHz coreCount: 4 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 37.33GiB/s
2021-03-10 11:33:41.068679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-03-10 11:33:41.069536: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-03-10 11:33:41.069884: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-03-10 11:33:41.070227: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-03-10 11:33:41.072721: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-03-10 11:33:41.072988: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-03-10 11:33:41.073254: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-03-10 11:33:41.073754: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-03-10 11:33:41.074267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-03-10 11:33:43.232733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-10 11:33:43.233087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-03-10 11:33:43.233294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-03-10 11:33:43.234651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1366 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
2021-03-10 11:33:43.238580: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Found 3670 files belonging to 5 classes.
Using 734 files for validation.
2021-03-10 11:33:43.923781: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-03-10 11:33:54.037806: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 53 of 1000
2021-03-10 11:34:03.968256: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:177] Filling up shuffle buffer (this may take a while): 87 of 1000
2021-03-10 11:34:04.136991: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:230] Shuffle buffer filled.
0.0 1.0
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling_1 (Rescaling) (None, 180, 180, 3) 0
_________________________________________________________________
conv2d (Conv2D) (None, 180, 180, 16) 448
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 90, 90, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 90, 90, 32) 4640
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 45, 45, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 45, 45, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 22, 22, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 30976) 0
_________________________________________________________________
dense (Dense) (None, 128) 3965056
_________________________________________________________________
dense_1 (Dense) (None, 5) 645
=================================================================
Total params: 3,989,285
Trainable params: 3,989,285
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
2021-03-10 11:34:08.244592: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-03-10 11:34:09.792816: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-03-10 11:34:09.825987: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
Process finished with exit code -1073740791 (0xC0000409)
What am I doing wrong & what's the cause of this problem?
My specs: Intel (R) Core (TM) i5-7200U CPU @ 2.50GHz 2.70GHz
8GB RAM
- bit operating system, x64-based processor
NVIDIA GEFORCE 940MX with GeForce Game Ready Driver (v. 461.72)
Python version: 3.8.5
TensorFlow: tensorflow._api.v2.version
CUDA: 11.0
解决方案
I deleted cudnn64_8.dll from the CUDA folder, and added CUDNN main folder to the path (instead of the pin as before.
推荐阅读
- azure - 我无法在 azure 上部署我的 nextjs 应用程序
- alias - 如何使用别名来简化 CUDA_VISIBLE_DEVICES
- javascript - 如何在java中构造一个类似json的body对象?
- laravel - 从控制器渲染图表以查看
- php - CodeIgniter 4 重定向功能不起作用
- python - Django manage.py:错误:无法识别的参数:runserver
- python - 使用 lmfit 和差分进化方法设置迭代限制
- javascript - 如何根据 Redux 上按下的按钮来实现从 API 接收不同数据的按钮?
- javascript - 带有 JQuery 的 IE6 Ajax
- java - 使用 Maven 从 Ubuntu 集成 Jenkins 中的 Selenium