首页 > 解决方案 > 如何使用已发布管道的 PipelineParameter 通过 DatabricksStep 指定 databricks 笔记本小部件?

问题描述

我创建了一个包含几个步骤的管道(azureml-defaults==1.23.0)。PipelineParameter当我从工作室运行已发布的管道时,无论我在提交管道时选择什么值,databricks 步骤始终采用默认值。

parser.add_argument('--import_date',         type=str, default = "2021-04-23")   
....
....

import_date      = PipelineParameter(name="import_date"       , default_value = params.import_date)
cluster_id       = PipelineParameter(name="cluster_id"        , default_value = params.cluster_id)
step_type        = PipelineParameter(name="step_type"         , default_value = params.step_type)
churn_months     = PipelineParameter(name="churnMonths"       , default_value = params.churnMonths)



data_import_step = DatabricksStep(name="Databricks Data Import Step",
                                  existing_cluster_id=str(cluster_id.default_value),
                                  notebook_path=import_notebook_path,
                                  notebook_params={'ChurnMonthsWidget': churn_months, 
                                                   'startDateWidget'  : port_start_date,
                                                   'ImportDateWidget' : import_date,
                                                   'StepTypeWidget'   : step_type},
                                  run_name='Job_Data_Import',
                                  compute_target=databricks_compute,
                                  allow_reuse=False)
.....
.....
.....
pipeline_steps = StepSequence(steps=[data_import_step                  #Step 1
                                    ,data_manipulation_step            #Step 2
                                    ,data_extraction_step              #Step 3
                                 
                                    
                                    ,training_data_preparation_step    #Step 4
                                    ,model_training_step               #Step 5
                                 
                                
                                    ,prediction_data_preparation_step  #Step 6
                                    ,prediction_step                   #Step 7
                                    ])
                                    
pipeline = Pipeline(workspace = ws, steps=pipeline_steps)

published_pipeline = pipeline.publish(name        = params.pipeline_name,
                                         description = params.pipeline_description)

import_date 的默认值为 2021-04-23。即使我将 import_date 参数设置为“2021-04-22”,databricks 笔记本仍将 2021-04-23 作为 import_date。

databricks 笔记本有以下小部件


today = str(date.today())
dbutils.widgets.text("ImportDateWidget", today, label = "ImportDate")
import_date = dbutils.widgets.get("ImportDateWidget")


startDate = "2019-01-01"  
dbutils.widgets.text("startDateWidget", startDate, label = "startDate")
start_date = dbutils.widgets.get("startDateWidget")

ChurnMonths = 3
dbutils.widgets.text("ChurnMonthsWidget", str(ChurnMonths), label = "ChurnMonths")

step_type = "Training"
dbutils.widgets.text("StepTypeWidget",step_type,label ="StepType" )

N_MONTHS_CHURN = int(dbutils.widgets.get("ChurnMonthsWidget"))
import_date    = dbutils.widgets.get("ImportDateWidget")
start_date     = dbutils.widgets.get("startDateWidget")
N_MONTHS_CHURN = int(dbutils.widgets.get("ChurnMonthsWidget"))
step_type = dbutils.widgets.get("StepTypeWidget")

标签: azure-databricksazure-machine-learning-studioazure-machine-learning-serviceazuremlazureml-python-sdk

解决方案


推荐阅读