首页 > 解决方案 > 如何使用 tensorflow 2.0 构建基本的 mnist 神经网络?

问题描述

我正在尝试使用 tensorflow 2.0 构建神经网络模型,但在网上找不到任何关于如何在 tensorflow 2.0 中进行操作的信息

我试过了,但我不知道如何应用渐变等等。

这是我尝试过的,

import math
import tensorflow as tf

(x_train,y_train),(x_test,y_test) = tf.keras.datasets.mnist.load_data()

x_train = tf.reshape(x_train,shape=(60000,28*28))
x_test = tf.reshape(x_test,shape=(10000,28*28))

x_train = tf.cast(x_train, tf.float32)
x_test = tf.cast(x_test, tf.float32)

n_input = 784
h1 = 512
h2 = 128
n_classes = 10

# weights and bias initializations
f1 = tf.Variable(tf.random.uniform(shape = (n_input,h1), minval = -(math.sqrt(6)/math.sqrt(n_input+h1)),  
                            maxval = (math.sqrt(6)/math.sqrt(n_input+h1)))) # Xavier uniform
f2 = tf.Variable(tf.random.uniform(shape = (h1,h2), minval = -(math.sqrt(6)/math.sqrt(h1+h2)),
                             maxval = (math.sqrt(6)/math.sqrt(h1+h2)))) 
out = tf.Variable(tf.random.uniform(shape = (h2,n_classes), minval = -(math.sqrt(6/(h2+n_classes))),
                                   maxval = math.sqrt(6/(h2+n_classes)) ))

b1 = tf.Variable(tf.random.uniform([h1]))
b2 = tf.Variable(tf.random.uniform([h2]))
b_out = tf.Variable(tf.random.uniform([n_classes]))

def mlp(x):
  input1 = tf.nn.sigmoid(tf.add(tf.matmul(x, f1), b1))
  input2 = tf.nn.sigmoid(tf.add(tf.matmul(input1, f2), b2))  
  output = tf.nn.softmax(tf.add(tf.matmul(input2, out), b_out))
  return output

n_shape = x_train.shape[0]
epochs = 2
batch_size = 128
lr_rate = 0.001

data_gen = tf.data.Dataset.from_tensor_slices((x_train, y_train)).repeat().shuffle(n_shape).batch(batch_size)

def grad(x, y):
  with tf.GradientTape() as tape:
    y_pred = mlp(x)
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=y_pred)
    loss = tf.reduce_mean(loss)
    return tape.gradient(loss, [w, b])

optimizer = tf.keras.optimizers.Adam(lr_rate)

for _ in range(epochs):
  no_steps = int(60000/128)
  for (batch_xs, batch_ys) in data_gen.take(no_steps):

我只是不知道在这种情况下如何进一步进行?我非常感谢您的帮助。谢谢

标签: pythontensorflow

解决方案


您的代码中存在以下问题:

  • 您忘记重新调整数据: x_train, x_test = x_train / 255.0, x_test / 255.0
  • wbin line:tape.gradient(loss, [w, b])未定义。
  • 有效labels的 dtype intf.nn.sparse_softmax_cross_entropy_with_logits应该是int32or int64,而对于logits,它应该是float16, float32, or float64。在您的情况下,它uint8用于标签。在通过之前将其转换为int32,如下所示

    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.cast(y, dtype=tf.int32), logits=y_pred)

  • 根据官方文档,

    警告:此操作需要未缩放的 logits,因为它在内部对 logits 执行 softmax 以提高效率。不要用 softmax 的输出调用这个操作,因为它会产生不正确的结果。

    因此,tf.nn.softmax从函数的输出中删除mlp,因为它softmaxlogits内部执行。

    有关 的更多信息tf.nn.sparse_softmax_cross_entropy_with_logits,请查看

您应该将grad函数和For循环修改为如下所示:

def grad(x, y):
  with tf.GradientTape() as tape:
    y_pred = mlp(x)
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.cast(y, dtype=tf.int32), logits=y_pred)
    loss = tf.reduce_mean(loss)
    return loss, tape.gradient(loss, [f1, b1, f2, b2, out, b_out])

optimizer = tf.keras.optimizers.Adam(lr_rate)

for epoch in range(epochs):
  no_steps = n_shape//batch_size
  for (batch_xs, batch_ys) in data_gen.take(no_steps):
    cost, grads = grad(batch_xs, batch_ys)
    optimizer.apply_gradients(zip(grads, [f1, b1, f2, b2, out, b_out]))
  print('epoch: {} loss: {}'.format(epoch, cost))

推荐阅读