python - Tensorflow 中的分类和连续交叉特征列
问题描述
在使用 Tensorflow 的估计器feature_column
时,可以交叉分类列和分桶连续列交叉列,但不能交叉分类和数字。是否可以从https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/feature_column/feature_column.py#L704实现此功能?
在 Tensforflow 图表中看到任何实现相同结果的替代方法也很棒。
import numpy as np
cont = np.array([1,2,3])
cat = np.array(['cat', 'dog', 'cat'])
cross_function(cat, cont) = np.array([[1,0],[0,2],[3,0]])
解决方案
在这里回答我自己的问题。涉及的步骤是:
- 对分类特征进行数值编码
- 在图表内,因此可以在火车和服务内
- 数字结果一热编码
- 将其与连续变量相乘
代码:
import numpy as np
import tensorflow as tf
cont = np.array([1,2,3])
cat = np.array(['cat', 'dog', 'cat'])
categories = np.unique(cat)
def categorical_continuous_interaction(categorical_onehot, continuous):
cont = tf.expand_dims(continuous, 0)
return tf.transpose(tf.multiply(tf.transpose(categorical_onehot), cont))
def transformation_function(feature_dictionary, mapping_table):
continuous_feature = feature_dictionary['cont']
categorical_feature = mapping_table.lookup(feature_dictionary['cat'])
onehot = tf.one_hot(categorical_feature, categories.shape[0])
cross_feature = categorical_continuous_interaction(onehot, continuous_feature)
return {'feature_name': cross_feature}
def input_function(dataframe, label_key, ...):
# categorical mapping tables, these must be generated outside of the dataset
# transformation function but within the input function
mapping_table = tf.contrib.lookup.index_table_from_tensor(
mapping=tf.constant(categories),
num_oov_buckets=0,
default_value=-1
)
# Generate the dataset of a dictionary of all of the dataframes columns
dataset = tf.data.Dataset.from_tensor_slices(dict(dataframe))
# Convert to a dataset of tuples of dicts with the labels as one tuple
dataset = dataset.map(lambda x: split_label(x, label_key))
# Transform the features dict within the dataset
dataset = dataset.map(lambda features, labels: (transformation_function(
features, mapping_table=mapping_table), labels))
...
return dataset
def serving_input_fn():
# categorical mapping tables, these must be generated outside of the dataset
# transformation function but within the input function
mapping_table=tf.contrib.lookup.index_table_from_tensor(
mapping=tf.constant(categories),
num_oov_buckets=0,
default_value=-1
)
numeric_receiver_tensors = {
name: tf.placeholder(dtype=tf.float32, shape=[1], name=name+"_placeholder")
for name in numeric_feature_column_names
}
categorical_receiver_tensors = {
name: tf.placeholder(dtype=tf.string, shape=[1], name=name+"_placeholder")
for name in categorical_feature_column_names
}
receiver_tensors = {**numeric_receiver_tensors, **categorical_receiver_tensors}
features = transformation_function(receiver_tensors,
country_mapping_table=country_mapping_table)
return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
推荐阅读
- angular - Zendesk API + Angular:由于 CORS 无法接收数据
- microsoft-edge - Edge against Polcy 的新标签扩展
- android - android app crash with database java.(lang.IllegalArgumentException: Account cannot be null)
- javascript - 使用 2 按钮将选择更改为输入
- android - 如何在不打开活动的情况下发送数据?
- rest - Sharepoint API URL 混淆
- c - 为什么 printf 中的 \n 不打印字符数组的值或旧值?
- python - 读取的 CSV 没有行
- r - 如何在 Shiny 应用程序中对两个变量使用 lapply?
- ipad - Visual Studio for Mac / iPad 设备 - 等待调试器连接