首页 > 解决方案 > Tensorflow:Dataset.from_generate()ValueError:使用序列设置数组元素

问题描述

目标

我想在 dataset.from_generator 方法中使用张量作为输入的一部分。

错误信息

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1321     try:
-> 1322       return fn(*args)
   1323     except errors.OpError as e:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1306       return self._call_tf_sessionrun(
-> 1307           options, feed_dict, fetch_list, target_list, run_metadata)
   1308 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1408           self._session, options, feed_dict, fetch_list, target_list,
-> 1409           run_metadata)
   1410     else:

InvalidArgumentError: ValueError: setting an array element with a sequence.
Traceback (most recent call last):

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 157, in __call__
    ret = func(*args)

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 391, in generator_py_func
    nest.flatten_up_to(output_types, values), flattened_types)

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 390, in <listcomp>
    for ret, dtype in zip(

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 124, in _convert
    result = np.asarray(value, dtype=dtype, order="C")

  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/numpy/core/numeric.py", line 492, in asarray
    return array(a, dtype, copy=False, order=order)

ValueError: setting an array element with a sequence.


     [[Node: PyFunc = PyFunc[Tin=[DT_INT64], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_150"](arg0)]]
     [[Node: IteratorGetNext_22 = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator_22)]]

During handling of the above exception, another exception occurred:

错误重现

如果将定义替换为b=tf.ones...b = np.rand..错误将消失。

import numpy as np
import tensorflow as tf

def _create_generator():
    for i in range(3):
        a = np.random.randn(3,2)
        b = tf.ones([1],tf.float32)
        #b= np.random.randn(1)
        result = {}
        result['a'] =  a
        result['b'] = b
        yield result


gen = _create_generator()

dataset = tf.data.Dataset().from_generator(_create_generator,
                        output_shapes={'a':None,'b':None},
                        output_types ={'a':tf.float32, 'b':tf.float32}).batch(1)
iterator = dataset.make_one_shot_iterator()
features = iterator.get_next()


init = tf.initialize_all_variables()
with tf.Session() as sess:
    sess.run(init)
    print(sess.run(features))

为什么我必须使用张量作为输入

嗯,这是因为我的真实程序需要使用另一个 tf.data.Dataset 的输出作为输入的一部分(数据以 TFRecords 格式存储)。所以它会引发与运行这段代码后看到的完全相同的错误。现在我不知道间接解决这个问题(不使用张量作为输入)。

为什么我需要使用 Dataset.from_generator

有一个技巧可以使用 estimator.predict() 而不在每次调用它时加载图表,即使用生成器来保持条目打开,它会假定你还没有完成“单一”预测。那么Tensorflow就不会一次又一次地加载模型图。


如果您需要有关我的信息的更多信息。让我知道。谢谢!

编辑1:

为什么我必须使用数据集 API

数据量巨大,最初保存在hdfs. 所以管道在 Spark 中处理并以TFRecord. 据我所知,我只能在这里使用Datasetapi 来恢复我的数据(也考虑到这里的性能)。

标签: python-3.xtensorflowtensorflow-datasets

解决方案


从评论中进一步讨论,Estimator.predict并没有做任何神奇的事情。诚然,有一些花哨的东西——主要用于跨多个 GPU 并行运行——但你总是可以通过以下方式手动构建图形Estimator.model_fn

estimator = get_estimator()           # however you generate it
features, labels = input_fn()         # whatever you would use with `predict`
mode = tf.estimator.ModeKeys.PREDICT  # or TRAIN/EVAL
# depending on your estimator, you may not need mode/config args
spec = estimator.model_fn(features, labels, mode, config=None)
# spec is a tf.estimator.EstimatorSpec - a named tuple
predictions = spec.predictions
# you might have to flatten the function inputs/outputs/Tout below
next_features = tf.py_func(
    next_features_fn, predictions, Tout={'a': tf.float32, 'b': tf.float32})

推荐阅读