首页 > 解决方案 > 线性回归 - 张量流

问题描述

我正在尝试制作一个可以根据葡萄酒数据预测葡萄酒质量的模型。我收到此错误:

ValueError:特征酒精不在特征字典中。

但我跑了print(feature_columns),这就是输出:

[NumericColumn(key='fixed acidity', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='volatile acidity', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='citric acid', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='residual sugar', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='chlorides', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='free sulfur dioxide', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='total sulfur dioxide', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='density', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='pH', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='sulphates', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='alcohool', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='quality', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None)]

alcohool有没有我不明白发生了什么。错误发生在:linear_est.train(train_input_fn)当我尝试训练我的模型时。

我的模型如下所示:

dftrain = pd.read_csv('winequality-red.csv').head(790)
dfeval = pd.read_csv('winequality-red.csv').tail(809)
y_train = dftrain.pop('quality')
y_eval = dfeval.pop('quality')


CATEGORICAL_COLUMNS = []
NUMERIC_COLUMNS = ['fixed acidity','volatile acidity','citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide','density','pH','sulphates','alcohool','quality']
feature_columns = []
for feature_name in CATEGORICAL_COLUMNS:
  vocabulary = dftrain[feature_name].unique() #gets a lsit of all unique values from given feature column
  feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
  
for feature_name in NUMERIC_COLUMNS:
  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype = tf.float64))

输入功能:

# INPUT FUNCTION
def make_input_fn(data_df, label_df, num_epochs=1000, shuffle=True, batch_size=32):
  def input_function():  # inner function, this will be returned
    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))  # create tf.data.Dataset object with data and its label
    if shuffle:
      ds = ds.shuffle(1000)  # randomize order of data
    ds = ds.batch(batch_size).repeat(num_epochs)  # split dataset into batches of 32 and repeat process for number of epochs
    return ds  # return a batch of the dataset
  return input_function  # return a function object for use

train_input_fn = make_input_fn(dftrain, y_train)  # here we will call the input_function that was returned to us to get a dataset object we can feed to the model
eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)


linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
# We create a linear estimtor by passing the feature columns we created earlier


linear_est.train(train_input_fn)  # train
result = linear_est.evaluate(eval_input_fn)  # get model metrics/stats by testing on tetsing data

clear_output()  # clears consoke output
print(result['accuracy'])  # the result variable is simply a dict of stats about our model

标签: pythontensorflowmachine-learning

解决方案


推荐阅读