deployment - 管理代理产品上的就绪探测失败
问题描述
我正在尝试在 AKS 上设置 SQLServer BDC,但该过程似乎并没有超出某个点。AKS 群集是在 Standard_E8_v3 VM ScaleSet 上构建的 3 节点群集。
以下是 pod 列表:C:\Users\rgn>kubectl get pods -n mssql-cluster
NAME READY STATUS RESTARTS AGE
control-qm754 3/3 Running 0 35m
controldb-0 2/2 Running 0 35m
controlwd-wxrlg 1/1 Running 0 32m
logsdb-0 1/1 Running 0 32m
logsui-mqfcv 1/1 Running 0 32m
metricsdb-0 1/1 Running 0 32m
metricsdc-9frbb 1/1 Running 0 32m
metricsdc-jr5hk 1/1 Running 0 32m
metricsdc-ls7mf 1/1 Running 0 32m
metricsui-pn9qf 1/1 Running 0 32m
mgmtproxy-x4ctb 2/2 Running 0 32m
当我对 mgmtproxy-x4ctb pod 运行 describe 时,我看到了以下内容。即使该状态表明它正在运行,它也不是(就绪探测失败)。我相信这就是部署没有进行的原因。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned mssql-cluster/mgmtproxy-x4ctb to aks-agentpool-34156060-vmss000002
Normal Pulling 11m kubelet, aks-agentpool-34156060-vmss000002 Pulling image "mcr.microsoft.com/mssql/bdc/mssql-service-proxy:2019-CU4-ubuntu-16.04"
Normal Pulled 11m kubelet, aks-agentpool-34156060-vmss000002 Successfully pulled image "mcr.microsoft.com/mssql/bdc/mssql-service-proxy:2019-CU4-ubuntu-16.04"
Normal Created 11m kubelet, aks-agentpool-34156060-vmss000002 Created container service-proxy
Normal Started 11m kubelet, aks-agentpool-34156060-vmss000002 Started container service-proxy
Normal Pulling 11m kubelet, aks-agentpool-34156060-vmss000002 Pulling image "mcr.microsoft.com/mssql/bdc/mssql-monitor-fluentbit:2019-CU4-ubuntu-16.04"
Normal Pulled 11m kubelet, aks-agentpool-34156060-vmss000002 Successfully pulled image "mcr.microsoft.com/mssql/bdc/mssql-monitor-fluentbit:2019-CU4-ubuntu-16.04"
Normal Created 11m kubelet, aks-agentpool-34156060-vmss000002 Created container fluentbit
Normal Started 11m kubelet, aks-agentpool-34156060-vmss000002 Started container fluentbit
Warning Unhealthy 10m (x6 over 11m) kubelet, aks-agentpool-34156060-vmss000002 Readiness probe failed: cat: /var/run/container.ready: No such file or directory
我尝试了两次,但两次都无法超越这一点。从链接看来,这个问题自上个月以来才存在。有人可以指出我正确的方向吗?
来自代理 pod 的日志列表:
2020/06/13 16:25:35 Setting the directories for 'agent:agent' owner with '-rwxrwxr-x' mode: [/var/opt /var/log /var/run/secrets /var/run/secrets/keytabs /var/run/secrets/certificates /var/run/secrets/credentials /var/opt/agent /var/log/agent /var/run/agent]
2020/06/13 16:25:35 Setting the directories for 'agent:agent' owner with '-rwxrwx---' mode: [/var/opt/agent /var/log/agent /var/run/agent]
2020/06/13 16:25:35 Searching agent configuration file at /opt/agent/conf/mgmtproxy.json
2020/06/13 16:25:35 Searching agent configuration file at /opt/agent/conf/agent.json
2020/06/13 16:25:35.777955 Changed the container umask from '-----w--w-' to '--------w-'
2020/06/13 16:25:35.778031 Setting the directories for 'supervisor:supervisor' owner with '-rwxrwx---' mode: [/var/log/supervisor/log /var/opt/supervisor /var/log/supervisor /var/run/supervisor]
2020/06/13 16:25:35.778170 Setting the directories for 'fluentbit:fluentbit' owner with '-rwxrwx---' mode: [/var/opt/fluentbit /var/log/fluentbit /var/run/fluentbit]
2020/06/13 16:25:35.778411 Agent configuration: {"PodType":"mgmtproxy","ContainerName":"fluentbit","GrpcPort":8311,"HttpsPort":8411,"ScaledSetKind":"ReplicaSet","securityPolicy":"certificate","dnsServicesToWaitFor":null,"cronJobs":null,"serviceJobs":null,"healthModules":null,"logRotation":{"agentLogMaxSize":500,"agentLogRotateCount":3,"serviceLogRotateCount":10},"fileMap":{"fluentbit-certificate.pem":"/var/run/secrets/certificates/fluentbit/fluentbit-certificate.pem","fluentbit-privatekey.pem":"/var/run/secrets/certificates/fluentbit/fluentbit-privatekey.pem","krb5.conf":"/etc/krb5.conf","nsswitch.conf":"/etc/nsswitch.conf","resolv.conf":"/etc/resolv.conf","smb.conf":"/etc/samba/smb.conf"},"userPermissions":{"agent":{"user":"agent","group":"agent","mode":"0770","modeSetgid":false,"directories":[]},"fluentbit":{"user":"fluentbit","group":"","mode":"","modeSetgid":false,"directories":[]},"fundamental":{"user":"agent","group":"agent","mode":"0775","modeSetgid":false,"directories":["/var/opt","/var/log","/var/run/secrets","/var/run/secrets/keytabs","/var/run/secrets/certificates","/var/run/secrets/credentials"]},"supervisor":{"user":"supervisor","group":"supervisor","mode":"0770","modeSetgid":false,"directories":["/var/log/supervisor/log"]}},"fileIgnoreList":["agent-certificate.pem","agent-privatekey.pem"],"InstanceId":"t4KLx1m5vDsHCHc038KgKHH5HOcQVR0Z","ContainerId":"","StartServicesImmediately":false,"DisableFileDownloads":false,"DisableHealthChecks":false,"serviceFencingEnabled":false,"isPrivileged":true,"IsConfigurationManagerEnabled":false,"LWriter":{"filename":"/var/log/agent/agent.log","maxsize":500,"maxage":0,"maxbackups":10,"localtime":true,"compress":false}}
2020/06/13 16:25:36.316209 Attempting to join cluster...
2020/06/13 16:25:36.316301 Source directory /var/opt/secrets/certificates/ca does not exist
2020/06/13 16:25:36.316520 [Reaper] Starting the signal loop for reaper
2020/06/13 16:25:40.642164 [Reaper] Received SIGCHLD signal. Starting process reaper.
2020/06/13 16:25:40.652703 Starting secure gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.943805 Cluster join successful.
2020/06/13 16:25:40.943846 Stopping gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.944704 Getting manifest from controller...
2020/06/13 16:25:40.964774 Downloading '/config/scaledsets/mgmtproxy/containers/fluentbit/files/fluentbit-certificate.pem' from controller...
2020/06/13 16:25:40.964816 Downloading '/config/scaledsets/mgmtproxy/containers/fluentbit/files/fluentbit-privatekey.pem' from controller...
2020/06/13 16:25:40.987309 Stored 1206 bytes to /var/run/secrets/certificates/fluentbit/fluentbit-certificate.pem
2020/06/13 16:25:40.992108 Stored 1694 bytes to /var/run/secrets/certificates/fluentbit/fluentbit-privatekey.pem
2020/06/13 16:25:40.992235 Agent is ready.
2020/06/13 16:25:40.992348 Starting supervisord with command: '[supervisord --nodaemon -c /etc/supervisord.conf]'
2020/06/13 16:25:40.992719 Started supervisord with pid=1437
2020/06/13 16:25:40.993030 Starting secure gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.996580 Starting HTTPS listener on 0.0.0.0:8411
2020/06/13 16:25:41.998667 [READINESS] Not all supervisord processes are ready. Attempts: 1, Max attempts: 250
2020/06/13 16:25:41.999567 Loading go plugin plugins/bdc.so
2020/06/13 16:25:41.999588 Loading go plugin plugins/platform.so
2020/06/13 16:25:41.999600 Starting the health monitoring, number of modules: 2, services: ["fluentbit","agent"]
2020/06/13 16:25:41.999605 Starting the health service
2020/06/13 16:25:41.999609 Starting the health durable store
2020/06/13 16:25:41.999614 Loading existing health properties from /var/opt/agent/health/health-properties-main.gob
2020/06/13 16:25:41.999642 No existing file path for file: /var/opt/agent/health/health-properties-main.gob
2020/06/13 16:25:42.640719 Adding a new plugin plugins/bdc.so
2020/06/13 16:25:43.302872 Adding a new plugin plugins/platform.so
2020/06/13 16:25:43.302932 Created a health module watcher for service 'fluentbit'
2020/06/13 16:25:43.302948 Starting a new watcher for health module: fluentbit
2020/06/13 16:25:43.302983 Starting a new watcher for health module: agent
2020/06/13 16:25:43.302992 Health monitoring started
2020/06/13 16:25:53.000908 [READINESS] All services marked as ready.
2020/06/13 16:25:53.000966 [READINESS] Container is now ready.
2020/06/13 16:26:01.995093 [MONITOR] Service states: map[fluentbit:RUNNING]
解决方案
全部,
终于解决了。
我们的 azure 政策和网络政策存在几个问题。
(1) It was not allowing new IP addresses to be assigned to the loadbalancer.
(2) The gateway proxy was not getting new IP Addresses since we ran out of our quota of 10 max IPs that were allowed.
(3) My desktop from where I started to deploy was not able to ping the controller service IP addresses and Port.
我们一个接一个地解决了上述问题,我们正处于最后阶段。
鉴于 IP 地址是静态的,但它是动态生成的,因此无法进行配置。其他人是如何与他们的网络/Azure 基础架构团队一起处理这个问题的?
谢谢,rgn
推荐阅读
- aws-lambda - 无法将数据批量保存到 aws dynamodb
- python-3.x - 如何在 FBV 中使用 PermissionRequiredMixin?
- php - 在wordpress循环中添加一个div?
- php - PHP 使用 file_get_contents($url) 解析数组数据
- javascript - Shopify - 首次亮相主题 - 如果选择了某些变体,则显示一个文本框
- python - 在 urls.py(Django) 中未检测到 DayArchiveview
- javascript - 无法理解类中的 Javascript setter 和 getter 查找
- amazon-web-services - 如何解决这个问题?AWS updateAutoScalingGroup - 错误:AccessDenied:您无权使用启动模板
- ios - 如何将下拉菜单添加到内联 SwiftUI NavigationBarTitle?
- react-native - 调用 react-expo react native 专业人士进行第三方库集成