首页 > 解决方案 > 如何更改张量流变换中张量的维度并将其提供给训练器?

问题描述

我正在做一个数据流量项目,我的工作是在 Google Cloud Platform 中使用 tensorflow 和 kubeflow 构建管道。我一直在尝试更改转换组件中 examplegen 输出的维度,并将训练器中的转换输出用作模型的输入。我需要将数据用作模型中的张量,但转换输出被保存为类。这是我的“预处理”和“功能”代码,它们是转换组件的一部分。

#Preprocessing
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import tensorflow_transform as tft

from models import features
import numpy as np
import pandas as pd


def _fill_in_missing(x):
"""Replace missing values in a SparseTensor.

Fills in missing values of `x` with '' or 0, and converts to a dense tensor.

Args:
    x: A `SparseTensor` of rank 2.  Its dense shape should have size at most 1
    in the second dimension.

Returns:
    A rank 1 tensor where missing values of `x` have been filled in.
"""
  if isinstance(x, tf.sparse.SparseTensor):
      default_value = '' if x.dtype == tf.string else 0
      dense_tensor = tf.sparse.to_dense(
          tf.SparseTensor(x.indices, x.values, [x.dense_shape[0], 1]), default_value)
  else:
      dense_tensor = x

  return tf.squeeze(dense_tensor, axis=1)


def preprocessing_fn(inputs):
    """tf.transform's callback function for preprocessing inputs.

    Args:
        inputs: map from feature keys to raw not-yet-transformed features.

    Returns:
        Map from string feature key to transformed feature operations.
    """
    outputs = {} 
    for key in features.DENSE_FLOAT_FEATURE_KEYS:
        # Preserve this feature as a dense float, setting nan's to the mean.
        outputs[features.transformed_name(key)] = tft.scale_to_z_score(_fill_in_missing(inputs[key]))


#   Do not apply label transformation as it will result in wrong evaluation.
    outputs[features.transformed_name(
        features.LABEL_KEY)] = inputs[features.LABEL_KEY]

    return outputs

#features
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from typing import Text, List, Any

import os
import tempfile
import numpy as np


# At least one feature is needed.

# Name of features which have continuous float values. These features will be
# used as their own values.
DENSE_FLOAT_FEATURE_KEYS = 
['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20',
                       
'21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40',
                        
'41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57','58','59','60',
                        
'61','62','63','64','65','66','67','68','69','70','71','72','73','74','75','76','77','78','79','80',
                        
'81','82','83','84','85','86','87','88','89','90','91','92','93','94','95','96','97','98','99','100',
                        
'101','102','103','104','105','106','107','108','109','110','111','112','113','114','115','116',
'117','118','119','120',
                        
'121','122','123','124','125','126','127','128','129','130','131','132','133','134','135','136',
'137','138','139','140',
                        
'141','142','143','144','145','146','147','148','149','150','151','152','153','154','155',
'156','157','158','159','160',
                        
'161','162','163','164','165','166','167','168','169','170','171','172','173','174','175',
'176','177','178','179','180',
                        
'181','182','183','184','185','186','187','188','189','190','191','192','193','194','195',
'196','197','198','199','200',
                        
'201','202','203','204','205','206','207','208','209','210','211','212','213','214',
'215','216','217','218','219','220',
                        
'221','222','223','224','225','226','227','228','229','230','231','232','233','234',
'235','236','237','238','239','240',
                        
'241','242','243','244','245','246','247','248','249','250','251','252','253',
'254','255','256','257','258','259','260',
                        
'261','262','263','264','265','266','267','268','269','270','271','272','273',
'274','275','276','277','278','279','280','281','282','283','284','285','286','287','288'                            
                        ]


# Keys
LABEL_KEY = '0'


def transformed_name(key: Text) -> Text:
    """Generate the name of the transformed feature from original name."""
    return key + '_x'


def transformed_names(keys: List[Text]) -> List[Text]:
    """Transform multiple feature names at once."""
    return [transformed_name(key) for key in keys]

标签: pythontensorflowgoogle-cloud-platformkubeflowkubeflow-pipelines

解决方案


推荐阅读