python - 线性回归 - 张量流
问题描述
我正在尝试制作一个可以根据葡萄酒数据预测葡萄酒质量的模型。我收到此错误:
ValueError:特征酒精不在特征字典中。
但我跑了print(feature_columns)
,这就是输出:
[NumericColumn(key='fixed acidity', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='volatile acidity', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='citric acid', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='residual sugar', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='chlorides', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='free sulfur dioxide', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='total sulfur dioxide', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='density', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='pH', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='sulphates', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='alcohool', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None), NumericColumn(key='quality', shape=(1,), default_value=None, dtype=tf.float64, normalizer_fn=None)]
alcohool
有没有我不明白发生了什么。错误发生在:linear_est.train(train_input_fn)
当我尝试训练我的模型时。
我的模型如下所示:
dftrain = pd.read_csv('winequality-red.csv').head(790)
dfeval = pd.read_csv('winequality-red.csv').tail(809)
y_train = dftrain.pop('quality')
y_eval = dfeval.pop('quality')
CATEGORICAL_COLUMNS = []
NUMERIC_COLUMNS = ['fixed acidity','volatile acidity','citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide','density','pH','sulphates','alcohool','quality']
feature_columns = []
for feature_name in CATEGORICAL_COLUMNS:
vocabulary = dftrain[feature_name].unique() #gets a lsit of all unique values from given feature column
feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
for feature_name in NUMERIC_COLUMNS:
feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype = tf.float64))
输入功能:
# INPUT FUNCTION
def make_input_fn(data_df, label_df, num_epochs=1000, shuffle=True, batch_size=32):
def input_function(): # inner function, this will be returned
ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df)) # create tf.data.Dataset object with data and its label
if shuffle:
ds = ds.shuffle(1000) # randomize order of data
ds = ds.batch(batch_size).repeat(num_epochs) # split dataset into batches of 32 and repeat process for number of epochs
return ds # return a batch of the dataset
return input_function # return a function object for use
train_input_fn = make_input_fn(dftrain, y_train) # here we will call the input_function that was returned to us to get a dataset object we can feed to the model
eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
# We create a linear estimtor by passing the feature columns we created earlier
linear_est.train(train_input_fn) # train
result = linear_est.evaluate(eval_input_fn) # get model metrics/stats by testing on tetsing data
clear_output() # clears consoke output
print(result['accuracy']) # the result variable is simply a dict of stats about our model
解决方案
推荐阅读
- php - 如果我不知道“viewpoint”或“pano”(只有地址),如何构建显示街景的 Google Maps URL?
- php - 如何根据 href id 显示用户 json 数据库?
- azure-data-factory - 如何从 Azure 数据工厂中的 AlterRow 转换中获取计数
- c++ - 在 Eigen C++ 中规范化二维线
- plotly - 尽管使用 height=100% 或自动调整,但无法将图表垂直自动调整屏幕
- java - 为多个 DBMS 和多个配置文件配置 Spring Boot
- python - 为什么函数在 Python 输出中返回 None
- python - |= 不支持的操作数类型:“列表”和“列表”
- python-3.x - 从 Pandas DataFrames 中理解密度图
- javascript - 如何在创建时将 onclick 设置为动态按钮并对应于正确的索引