首页 > 解决方案 > 如何有效地使用由ordereddict组成的tf.data.Dataset?

问题描述

使用 TensorFlow 2.3.1,下面的代码片段失败。

import tensorflow as tf

url = "https://storage.googleapis.com/download.tensorflow.org/data/creditcard.zip"

tf.keras.utils.get_file(
    origin=url,
    fname='creditcard.zip',
    cache_dir="/tmp/datasets/",
    extract=True)

ds = tf.data.experimental.make_csv_dataset(
    "/tmp/datasets/*.csv",
    batch_size=2048,
    label_name="Class",
    select_columns=["V1","V2","Class"],
    num_rows_for_inference=None,
    shuffle_buffer_size=600,
    ignore_errors=True)

model = tf.keras.Sequential(
    [
        tf.keras.layers.Dense(256, activation="relu"),
        tf.keras.layers.Dense(1, activation="sigmoid", name="labeling"),
    ],
)

model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-2),
    loss="binary_crossentropy", 
)

model.fit(
    ds,
    steps_per_epoch=5,
    epochs=3,
)

错误堆栈是

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-c79f80f9d0fd> in <module>
----> 1 model.fit(
      2     ds,
      3     steps_per_epoch=5,
      4     epochs=3,
      5 )

[...]

    ValueError: Layer sequential expects 1 inputs, but it received 2 input tensors. Inputs received: [<tf.Tensor 'ExpandDims:0' shape=(2048, 1) dtype=float32>, <tf.Tensor 'ExpandDims_1:0' shape=(2048, 1) dtype=float32>]

到目前为止我使用的解决方案是

def workaround(features, labels):
    return (tf.stack(list(features.values()), axis=1), labels)

model.fit(
    ds.map(workaround),
    steps_per_epoch=5,
    epochs=3,
)

我对你们 TF 大师的问题:

标签: tensorflowkerastensorflow2.0tensorflow-datasets

解决方案


我不确定您的代码是否适合 memoey 中的数据。

如果没有,您可以像这样更改代码:

import tensorflow as tf

url = "https://storage.googleapis.com/download.tensorflow.org/data/creditcard.zip"
ds = tf.data.experimental.make_csv_dataset(
    "/tmp/datasets/*.csv",
    batch_size=2048,
    label_name="Class",
    select_columns=["V1","V2","Class"],
    num_rows_for_inference=None,
    ignore_errors=True,
    num_epochs = 1,
    shuffle_buffer_size=2048*1000, 
    prefetch_buffer_size=tf.data.experimental.AUTOTUNE
)

input_list = []
for column in ["V1", "V2"]:
    _input = tf.keras.Input(shape=(1,))
    input_list.append(_input)

concat = tf.keras.layers.Concatenate(name="concat")(input_list)
dense = tf.keras.layers.Dense(256, activation="relu", name="dense", dtype='float64' )(concat)
output_dense = tf.keras.layers.Dense(1, activation="sigmoid", name="labeling", dtype='float64')(dense)
model = tf.keras.Model(inputs=input_list, outputs=output_dense)

model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-2),
    loss="binary_crossentropy", 
)

model.fit(
    ds,
    steps_per_epoch=5,
    epochs=10,
)

推荐阅读