首页 > 解决方案 > 我错过了什么吗?TensorFlow中的简单分类器输入函数出错

问题描述

我一直在关注 TensorFlow 上的 freecodecamp 教程,并尝试修改基本分类器来处理我自己的结构化数据集之一。

我有一个训练数据集和一个测试数据集,每个数据集都包含一些整数和一些字符串。我正在尝试预测已分配列中的值,但是在调用 Classifier.train 方法时它会不断抛出此错误:

UnimplementedError: Cast string to float is not supported
     [[{{node head/losses/Cast}}]]

During handling of the above exception, another exception occurred:

UnimplementedError                        Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1392                     '\nsession_config.graph_options.rewrite_options.'
   1393                     'disable_meta_optimizer = True')
-> 1394       raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
   1395 
   1396   def _extend_graph(self):

UnimplementedError: Cast string to float is not supported
     [[node head/losses/Cast (defined at /usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/head/binary_class_head.py:255) ]]

我尝试转换数据集,以便所有值都是整数或浮点数,但我不断收到相同的错误。据我所知,分类器应该能够对不同的数据类型进行操作,所以除非我需要在某处定义它们,否则我不明白为什么会出现问题?

我知道它正在正确读取数据,因为当我使用 .head() 函数时,它的格式都正确。我已经被这个错误困住了好几天,我无法弄清楚我错过了什么。任何帮助将不胜感激。我的代码如下。

%tensorflow_version 2.x 

from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
from six.moves import urllib
import tensorflow.compat.v2.feature_column as fc
import tensorflow as tf

CSV_COLUMN_NAMES = ['GroupNumber', 'GroupUnit', 'GroupSkill1', 'GroupSkill2', 'GroupSkill3', 'GroupSkill4', 'GroupPreference1', 
                'GroupPreference2', 'GroupPreference3', 'ProjectNumber', 'ProjectUnit', 'ProjectSkill1', 'ProjectSkill2', 'ProjectSkill3', 'ProjectSkill4', 'ProjectPreference1', 'ProjectPreference2', 'ProjectPreference3', 'Allocated']
ALLOCATED = [0, 1]

train = pd.read_csv('https://raw.githubusercontent.com/nickjackson862/machine-learning/main/trainData40_10.csv', names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv('https://raw.githubusercontent.com/nickjackson862/machine-learning/main/testData40_10.csv', names=CSV_COLUMN_NAMES, header=0)

train_y = train.pop('Allocated')
test_y = test.pop('Allocated')
train.head()


def input_fn(features, labels, training=True, batch_size=100):   
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))    
    if training:
        dataset = dataset.shuffle(10).repeat()    
    return dataset.batch(batch_size)

my_feature_columns = []
for key in train.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    hidden_units=[50, 20],
    n_classes=2)

classifier.train(
    input_fn=lambda: input_fn(train, train_y, training=True),
    steps=100)

eval_result = classifier.evaluate(
    input_fn=lambda: input_fn(test, test_y, training=False))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

标签: pythonpython-3.xpandastensorflowmachine-learning

解决方案


我在您创建功能列的这一行中发现了问题。

my_feature_columns.append(tf.feature_column.numeric_column(key=key))

您正在使每个特征都成为数字特征,但是查看数据集的几个字段是字符串(顺便说一句,CSV 文件是公共的,您可能需要对此进行补救)。

我尝试转换数据集,以便所有值都是整数或浮点数,但我不断收到相同的错误。

我相信你做错了。我刚刚尝试运行您的代码,但删除了所有字符串类型列,并且它成功运行且没有错误。我所做的只是在读取 CSV 后添加以下行

train.drop(columns=['GroupSkill1', 'GroupSkill2', 'GroupSkill3', 'GroupSkill4', "ProjectSkill1", "ProjectSkill2", "ProjectSkill3", "ProjectSkill4", ], axis=1, inplace=True)
test.drop(columns=['GroupSkill1', 'GroupSkill2', 'GroupSkill3', 'GroupSkill4', "ProjectSkill1", "ProjectSkill2", "ProjectSkill3", "ProjectSkill4", ], axis=1, inplace=True)

查看这篇文章,了解为您的非数字数据创建特征列的建议:https ://www.tensorflow.org/tutorials/structured_data/feature_columns


推荐阅读