首页 > 解决方案 > 使用 Databricks 作业 API 的自定义 Docker 映像

问题描述

有没有办法在 Azure Databricks 中使用自定义 Docker 映像创建临时作业集群?我只能找到有关使用 Docker 服务创建普通集群的信息。

我想发送到 azuredatabricks.net/api/2.0/jobs/create API 的作业定义 json 如下所示:

{
    "databricks_pool_name": "test",
    "job_settings": {
        "name": "job-test",
        "new_cluster": {
            "num_workers": 1,
            "spark_version": "7.3.x-scala2.12",
            "instance_pool_id": "<INSTANCE_POOL_PLACEHOLDER>",
            "docker_image": {
                "url": "<ACR_HOST_NAME>",
                "basic_auth": {
                    "username": "<ACR_USER>",
                    "password": "<ACR_TOKEN>"
                }
            }
        },
        "max_concurrent_runs": 1,
        "max_retries": 0,
        "schedule": {
            "quartz_cron_expression": "0 0 0 2 * ?",
            "timezone_id": "UTC"
        },
        "spark_python_task": {
            "python_file": "dbfs:/poc.py"
        },
        "timeout_seconds": 5400
    }
}

标签: dockerrestpysparkdatabricksazure-databricks

解决方案


JSON 的结构不正确 - 如果您查看Jobs API 的文档,您会发现您只需要发送job_settings字段的内容:

{
  "name": "job-test",
  "new_cluster": {
    "num_workers": 1,
    "spark_version": "7.3.x-scala2.12",
    "instance_pool_id": "<INSTANCE_POOL_PLACEHOLDER>",
    "docker_image": {
      "url": "<ACR_HOST_NAME>",
      "basic_auth": {
        "username": "<ACR_USER>",
        "password": "<ACR_TOKEN>"
      }
    }
  },
  "max_concurrent_runs": 1,
  "max_retries": 0,
  "schedule": {
    "quartz_cron_expression": "0 0 0 2 * ?",
    "timezone_id": "UTC"
  },
  "spark_python_task": {
    "python_file": "dbfs:/poc.py"
  },
  "timeout_seconds": 5400
}

推荐阅读