首页 > 解决方案 > 将特征定义保留在字典中并将特征返回给客户端

问题描述

我有以下字典,用于将特征定义保存为字符串。

    features = {
  "journey_email_been_sent_flag": "F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0))",
  "journey_opened_flag": "F.when(F.col('opened_14days') > 0, F.lit(1)).otherwise(F.lit(0))"
}
retrieved_features = {}
non_retrieved_features = {}

或将其作为定义本身。

    features = {
  "journey_email_been_sent_flag": F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0)),
  "journey_opened_flag": F.when(F.col('opened_14days') > 0, F.lit(1)).otherwise(F.lit(0))
}

然后下面的代码用于检索特征定义

 def feature_extract(*featurenames):
  for featurename in featurenames:
    if featurename in features:
      print(f"{featurename} : {features[featurename]}")
      retrieved_features[featurename] = features[featurename]
    else:
      print('failure')
      non_retrieved_features[featurename] = "Not Found in the feature defenition"
  return retrieved_features

这就是我调用检索特征的函数的方式

feature_extract('journey_email_been_sent_flag','journey_opened_flag')

但是,当我尝试检索未来时它不起作用,当我将定义保存在字典中时,我收到以下结果

Out[19]: {'journey_email_been_sent_flag': Column<b'CASE WHEN (email_14days > 0) THEN 1 ELSE 0 END'>}

当我在数据框中调用如下特征检索时。

.withColumn('journey_email_been_sent_flag', feature_extract('journey_email_been_sent_flag'))

低于错误

AssertionError: col should be Column

标签: pythonpysparkfeature-extractionfeature-selection

解决方案


我可以通过这种方式修复它

我将特征定义保留为定义

    features = {
  "journey_email_been_sent_flag": F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0)),
  "journey_opened_flag": F.when(F.col('opened_14days') > 0, F.lit(1)).otherwise(F.lit(0))
}

并使用 F.lit 调用 feature_extract 函数

F.lit(feature_extract('journey_email_been_sent_flag').get('journey_email_been_sent_flag'))

推荐阅读