首页 > 解决方案 > Composer 2 / GKE Autopilot Cluster PodOperator 任务的工作负载身份和服务帐户

问题描述

我正在尝试在 Composer 2 环境中运行 GKEStartPodOperator/KubernetesPodOperator 任务,该环境在自动驾驶模式下使用 GKE 集群。我们有一个现有的 Composer 1 环境,其中 GKE 集群未处于自动驾驶模式。我们使用 Google Cloud Platform 服务(BigQuery、GCS 等)进行身份验证的任务在 Composer 2 环境中失败并出现未经授权的 401,但在 Composer 1 环境中成功。

在日志文件中,我可以看出两种环境中的任务都是通过对元数据服务器的请求来获取其凭据的。主要区别在于 Composer 1 中的任务请求分配给任务运行所在节点的服务帐户,但 Composer 2 中的任务请求似乎是工作负载身份池,例如[project-name].svc.id.goog.

Composer 1 的日志是:

[2021-10-22 12:38:01,349] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,351] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,352] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,359] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,374] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-22 12:38:01,392] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,393] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,393] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,395] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,398] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-22 12:38:01,412] {pod_launcher.py:148} INFO - DEBUG:google.cloud.bigquery.opentelemetry_tracing:This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-22 12:38:01,414] {pod_launcher.py:148} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-22 12:38:01,415] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
[2021-10-22 12:38:01,437] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): metadata.google.internal:80
[2021-10-22 12:38:01,452] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 226
[2021-10-22 12:38:01,454] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[project-id]-compute@developer.gserviceaccount.com/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform
[2021-10-22 12:38:01,463] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/[project-id]-compute@developer.gserviceaccount.com/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform HTTP/1.1" 200 1049
[2021-10-22 12:38:01,468] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): bigquery.googleapis.com:443
[2021-10-22 12:38:02,028] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "POST /bigquery/v2/projects/[project-nam]/jobs?prettyPrint=false HTTP/1.1" 200 None

Composer 2 的日志是:

[2021-10-21 13:56:06,619] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,620] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,620] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,621] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,624] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-21 13:56:06,634] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-21 13:56:06,641] {pod_launcher.py:149} INFO - DEBUG:google.cloud.bigquery.opentelemetry_tracing:This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-21 13:56:06,642] {pod_launcher.py:149} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-21 13:56:06,642] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
[2021-10-21 13:56:06,714] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): metadata.google.internal:80
[2021-10-21 13:56:06,720] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 121
[2021-10-21 13:56:06,721] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[project-name].svc.id.goog/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform
[2021-10-21 13:56:06,831] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/[project-name].svc.id.goog/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform HTTP/1.1" 200 765
[2021-10-21 13:56:06,833] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): bigquery.googleapis.com:443
[2021-10-21 13:56:06,866] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "POST /bigquery/v2/projects/[project-name]/jobs?prettyPrint=false HTTP/1.1" 401 None

根据Workload Identity 文档,我想我需要将特定的服务帐户绑定到运行 pod 的节点/节点池,但我不确定如何使用 Composer 2 GKE Autopilot 来做到这一点,因为节点是为我管理的。Composer 2 目前没有关于使用 KubernetesPodOperator 或 GKEStartPodOperator 的文档。

总之,我的问题是:我应该如何配置我的 Composer 2 环境 PodOperator 任务以利用特定的服务帐户对 GCP 服务进行身份验证?

标签: airflowgoogle-kubernetes-engineservice-accountsgoogle-cloud-composerworkload-identity

解决方案


我从运维工程师那里得到了一些指导,现在有一个 KubernetesPodOperator 任务通过服务帐户成功地通过 GCP 服务进行身份验证。我将在下面分享步骤和有用的信息。

首先,按照使用 Workload Identity 向 Google Cloud 进行身份验证的步骤操作。我以为 Composer 2 为我配置了 kubernetes <> google 云服务帐户绑定和注释,但事实并非如此。我必须按照说明创建名称空间、kubernetes 服务帐户、ksa 和 gsa 的绑定以及 KSA 的注释。

其次,我必须使用参数更新我的 KubernetesPodOperator 实例,namespace并将其service_account_name设置为我在第一步中创建的命名空间和 kubernetes 服务帐户。

上传 DAG 并稍后执行任务,我可以确认这两个步骤使我的任务能够请求绑定的 Google 服务帐户,并且从那里 Google 客户端库身份验证在我针对 BigQuery 的测试中成功。


推荐阅读