首页 > 解决方案 > 如何将 tensorflow sequence_numeric_column 与 RNNClassifier 一起使用?

问题描述

我正在寻找 tensorflow contrib API,我想使用Tensorflow 1.13 提供的 RNNClassifier。与非序列估计器相反,这个估计器只需要序列特征列。但是我无法让它在玩具数据集上工作。我在使用sequence_numeric_column时不断收到错误消息。

这是我的玩具数据集的结构:

idSeq,kind,label,size
0,0,dwarf,117.6
0,0,dwarf,134.4
0,0,dwarf,119.0
0,1,human,168.0
0,1,human,145.25
0,2,elve,153.9
0,2,elve,218.49999999999997
0,2,elve,210.9
1,0,dwarf,166.6
1,0,dwarf,168.0
1,0,dwarf,131.6
1,1,human,150.5
1,1,human,208.25
1,1,human,210.0
1,2,elve,199.5
1,2,elve,161.5
1,2,elve,197.6

其中 idSeq 允许我们查看哪些行属于哪个序列。由于“大小”列,我想预测“种类”列。

下面是关于在我的数据集上进行 RNN 训练的代码。

import numpy as np
import pandas as pd
import tensorflow as tf


os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
tf.logging.set_verbosity(tf.logging.INFO)

dataframe = pd.read_csv("data_rnn.csv")
dataframe_test = pd.read_csv("data_rnn_test.csv")


train_x = dataframe
train_y = dataframe.loc[:,(["kind"])]


size_feature_col = tf.contrib.feature_column.sequence_numeric_column('size ')


estimator = tf.contrib.estimator.RNNClassifier(
    sequence_feature_columns=[size_feature_col ],
    num_units=[32, 16],
    cell_type='lstm',
    model_dir=None,
    n_classes=3,
    optimizer='Adagrad'
)



def make_dataset(
    batch_size, 
    x, 
    y=None, 
    shuffle=False, 
    shuffle_buffer_size=1000,
    shuffle_seed=1):
    """
    An input function for training, evaluation or prediction.

    Parameters
    ----------------------
    batch_size: integer
        the size of the batch to use for the training of the neural network
    x: pandas dataframe 
        dataframe that contains the features of the samples to study
    y: pandas dataframe or array (Default: None)
        dataframe or array that contains the values to predict of the samples
        to study. If none, we want a dataset for evaluation or prediction.
    shuffle: boolean (Default: False)
        if True, we shuffle the elements of the dataset
    shuffle_buffer_size: integer (Default: 1000)
        if we shuffle the elements of the dataset, it is the size of the buffer
        used for it.
    shuffle_seed : integer
        the random seed for the shuffling

    Returns
    ---------------------
    dataset.make_one_shot_iterator().get_next(): Tensor
        a nested structure of tf.Tensors containing the next element of the 
        dataset to study
    """

    def input_fn():
        if y is not None:
            dataset = tf.data.Dataset.from_tensor_slices((dict(x), y))
        else:
            dataset = tf.data.Dataset.from_tensor_slices(dict(x))
        if shuffle:
            dataset = dataset.shuffle(
                buffer_size=shuffle_buffer_size,
                seed=shuffle_seed).batch(batch_size).repeat()
        else:
            dataset = dataset.batch(batch_size)
        return dataset.make_one_shot_iterator().get_next()

    return input_fn



batch_size = 50
random_seed = 1


input_fn_train = make_dataset(
            batch_size=batch_size, 
            x=train_x, 
            y=train_y, 
            shuffle=True, 
            shuffle_buffer_size=len(train_x),
            shuffle_seed=random_seed)

estimator.train(input_fn=input_fn_train, steps=5000)

但我只收到以下错误:

INFO:tensorflow:Calling model_fn.
Traceback (most recent call last):
  File "main.py", line 125, in <module>
    estimator.train(input_fn=input_fn_train, steps=5000)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/contrib/estimator/python/estimator/rnn.py", line 512, in _model_fn
    config=config)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/contrib/estimator/python/estimator/rnn.py", line 332, in _rnn_model_fn
    logits, sequence_length_mask = logit_fn(features=features, mode=mode)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/contrib/estimator/python/estimator/rnn.py", line 226, in rnn_logit_fn
    features=features, feature_columns=sequence_feature_columns)
  File "/root/.local/lib/python3.5/site-packages/tensorflow/contrib/feature_column/python/feature_column/sequence_feature_column.py", line 120, in sequence_input_layer
    trainable=trainable)
  File "/root/.local/lib/python3.5/site-packages/tensorflow/contrib/feature_column/python/feature_column/sequence_feature_column.py", line 496, in _get_sequence_dense_tensor
    sp_tensor, default_value=self.default_value)
  File "/root/.local/lib/python3.5/site-packages/tensorflow/python/ops/sparse_ops.py", line 1432, in sparse_tensor_to_dense
    sp_input = _convert_to_sparse_tensor(sp_input)
  File "/root/.local/lib/python3.5/site-packages/tensorflow/python/ops/sparse_ops.py", line 68, in _convert_to_sparse_tensor
    raise TypeError("Input must be a SparseTensor.")
TypeError: Input must be a SparseTensor.

所以我不明白我做错了什么,因为在文档中,写着我们必须给 RNNEstimator 一个序列列。他们没有说任何关于给出稀疏张量的事情。

提前感谢您的帮助和建议。

标签: tensorflowrecurrent-neural-network

解决方案


推荐阅读