首页 > 解决方案 > 如何通过 Azure CLI 扩展 Azure Kubernetes 集群

问题描述

当我尝试按照以下文档扩展我的 Azure Kubernetes 集群时:

az aks scale --resource-group my-resource-group --name my-cluster --node-count 5 --nodepool-name default

我明白了

cli.azure.cli.core.util : request failed: Error occurred in request., RetryError: HTTPSConnectionPool(host='management.azure.com', port=443): Max retries exceeded with url: /subscriptions/[subscriptionguid]/resourceGroups/my-resource-group/providers/Microsoft.ContainerService/managedClusters/my-cluster?api-version=2020-03-01 (Caused by ResponseError('too many 500 error responses',))
request failed: Error occurred in request., RetryError: HTTPSConnectionPool(host='management.azure.com', port=443): Max retries exceeded with url: /subscriptions/[subscriptionguid]/resourceGroups/my-resource-group/providers/Microsoft.ContainerService/managedClusters/my-cluster?api-version=2020-03-01 (Caused by ResponseError('too many 500 error 
responses',))

我在 Windows 中的 Azure CLI 2.3.1 上。我也在 WSL 中尝试过 2.2。我能够很好地通过 UI 进行缩放。自动缩放是错误的。只有一个节点池(称为默认)。该集群是通过 Terraform 创建的。其他 az 命令工作正常。我尝试以用户和服务主体身份登录。我没有代理。如果我添加 --debug 没有立即值的弹出。

如果我在 Fiddler 中查看 http 请求,500 个结果的响应正文如下所示:

message=The credentials in ServicePrincipalProfile were invalid. Please see https://aka.ms/aks-sp-help for more details. (Details: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided.\r\nTrace ID: 4d0fe224-1e60-4a91-91f1-399f697c0600\r\nCorrelation ID: 95b7e354-a63d-450e-8a7c-1851605a5b25\r\nTimestamp: 2020-04-07 13:51:07Z","error_codes":[7000215],"timestamp":"2020-04-07 13:51:07Z","trace_id":"4d0fe224-1e60-4a91-91f1-399f697c0600","correlation_id":"95b7e354-a63d-450e-8a7c-1851605a5b25","error_uri":"https://login.microsoftonline.com/error?code=7000215"})

如果我做:

az aks show --resource-group my-resource-group --name my-cluster --query agentPoolProfiles

结果是:

[
  {
    "availabilityZones": null,
    "count": 3,
    "enableAutoScaling": false,
    "enableNodePublicIp": null,
    "maxCount": null,
    "maxPods": 110,
    "minCount": null,
    "mode": "User",
    "name": "default",
    "nodeLabels": null,
    "nodeTaints": null,
    "orchestratorVersion": "1.15.7",
    "osDiskSizeGb": 30,
    "osType": "Linux",
    "provisioningState": "Succeeded",
    "scaleSetEvictionPolicy": null,
    "scaleSetPriority": null,
    "spotMaxPrice": null,
    "tags": null,
    "type": "AvailabilitySet",
    "vmSize": "Standard_D2_v3"
  }
]

我究竟做错了什么?如何让 AKS 通过 CLI 进行扩展?或者失败了,我该如何调试?

标签: azureazure-aks

解决方案


我最终通过升级到最新的 terraform 版本和 terraform azure 提供程序解决了这个问题(我将 azurerm 从 1.32.1 升级到 2.0,将 terraform 从 0.12.17 升级到 0.12.24)。然后我删除了集群并让 Terraform 重新创建它。现在它可以从命令行很好地扩展。我怀疑它所做的相关更改是将节点池的类型从“AvailabilitySet”更改为“VirtualMachineScaleSets”。


推荐阅读