jenkins - 什么可能导致 Kubernetes Jenkins slave pod 启动和挂起
问题描述
我正在使用 Kubernetes Jenkins 构建项目,但有时当 Jenkins 启动一个 pod 时,它显示正在启动.....然后暂停。当我检查日志输出时,它显示 404。
HTTP ERROR 404 Not Found
URI: /computer/default-j07v7/log
STATUS: 404
MESSAGE: Not Found
SERVLET: Stapler
Powered by Jetty:// 9.4.27.v20200227
这个错误看起来像:
当吊舱被暂停并一次又一次地重新启动时。pod 创建的事件看起来很正常:
Normal Scheduled default-scheduler Successfully assigned infrastructure/default-v7m44 to k8sslave3
Normal Pulled 1 2020-08-16T08:29:36Z 2020-08-16T08:29:36Z kubelet Container image "jenkins/jnlp-slave:3.27-1" already present on machine
Normal Created 1 2020-08-16T08:29:36Z 2020-08-16T08:29:36Z kubelet Created container jnlp
Normal Started 1 2020-08-16T08:29:36Z 2020-08-16T08:29:36Z kubelet Started container jnlp
我应该怎么做才能解决这个问题?尝试了几天,我发现如果我调整 pod templdate 的任何参数,代理会立即变为暂停。如果默认保持,代理应该正常启动。这是有线问题,让我感到困惑。这是我的詹金斯主部署 yaml:
kind: Deployment
apiVersion: apps/v1
metadata:
name: jenkins
namespace: infrastructure
selfLink: /apis/apps/v1/namespaces/infrastructure/deployments/jenkins
uid: 3df24fd6-ffaf-4f17-8b02-a2904cabbf95
resourceVersion: '1707498'
generation: 38
creationTimestamp: '2020-07-18T14:48:47Z'
labels:
app.kubernetes.io/component: jenkins-master
app.kubernetes.io/instance: jenkins
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: jenkins
helm.sh/chart: jenkins-2.4.1
annotations:
deployment.kubernetes.io/revision: '10'
meta.helm.sh/release-name: jenkins
meta.helm.sh/release-namespace: infrastructure
managedFields:
- manager: Go-http-client
operation: Update
apiVersion: apps/v1
time: '2020-08-02T10:08:04Z'
fieldsType: FieldsV1
- manager: dashboard
operation: Update
apiVersion: apps/v1
time: '2020-08-17T14:27:59Z'
fieldsType: FieldsV1
fieldsV1:
'f:spec':
'f:template':
'f:spec':
'f:containers':
'k:{"name":"jenkins"}':
'f:volumeMounts':
'k:{"mountPath":"/usr/bin/docker"}':
.: {}
'f:mountPath': {}
'f:name': {}
'k:{"mountPath":"/var/run/docker.sock"}':
.: {}
'f:mountPath': {}
'f:name': {}
'f:securityContext':
'f:runAsUser': {}
'f:volumes':
'k:{"name":"docker"}':
.: {}
'f:hostPath':
.: {}
'f:path': {}
'f:type': {}
'f:name': {}
'k:{"name":"dockersock"}':
.: {}
'f:hostPath':
.: {}
'f:path': {}
'f:type': {}
'f:name': {}
- manager: kube-controller-manager
operation: Update
apiVersion: apps/v1
time: '2020-08-18T16:14:00Z'
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
'f:deployment.kubernetes.io/revision': {}
'f:status':
'f:availableReplicas': {}
'f:conditions':
.: {}
'k:{"type":"Available"}':
.: {}
'f:lastTransitionTime': {}
'f:lastUpdateTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
'k:{"type":"Progressing"}':
.: {}
'f:lastTransitionTime': {}
'f:lastUpdateTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
'f:observedGeneration': {}
'f:readyReplicas': {}
'f:replicas': {}
'f:updatedReplicas': {}
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: jenkins-master
app.kubernetes.io/instance: jenkins
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/component: jenkins-master
app.kubernetes.io/instance: jenkins
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: jenkins
helm.sh/chart: jenkins-2.4.1
annotations:
checksum/config: 60990c68bb90ec59c79d56498da29d250d8da13cfbb9c35cad53f0cd789f318b
spec:
volumes:
- name: plugins
emptyDir: {}
- name: tmp
emptyDir: {}
- name: jenkins-config
configMap:
name: jenkins
defaultMode: 420
- name: secrets-dir
emptyDir: {}
- name: plugin-dir
emptyDir: {}
- name: jenkins-home
persistentVolumeClaim:
claimName: jenkins
- name: sc-config-volume
emptyDir: {}
- name: dockersock
hostPath:
path: /var/run/docker.sock
type: ''
- name: docker
hostPath:
path: /usr/bin/docker
type: ''
initContainers:
- name: copy-default-config
image: 'jenkins/jenkins:lts'
command:
- sh
- /var/jenkins_config/apply_config.sh
env:
- name: ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: jenkins
key: jenkins-admin-password
- name: ADMIN_USER
valueFrom:
secretKeyRef:
name: jenkins
key: jenkins-admin-user
resources:
limits:
cpu: '2'
memory: 4Gi
requests:
cpu: 50m
memory: 256Mi
volumeMounts:
- name: tmp
mountPath: /tmp
- name: jenkins-home
mountPath: /var/jenkins_home
- name: jenkins-config
mountPath: /var/jenkins_config
- name: secrets-dir
mountPath: /usr/share/jenkins/ref/secrets/
- name: plugins
mountPath: /usr/share/jenkins/ref/plugins
- name: plugin-dir
mountPath: /var/jenkins_plugins
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
containers:
- name: jenkins
image: 'jenkins/jenkins:lts'
args:
- '--argumentsRealm.passwd.$(ADMIN_USER)=$(ADMIN_PASSWORD)'
- '--argumentsRealm.roles.$(ADMIN_USER)=admin'
- '--httpPort=8080'
ports:
- name: http
containerPort: 8080
protocol: TCP
- name: slavelistener
containerPort: 50000
protocol: TCP
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: JAVA_OPTS
value: |
-Dcasc.reload.token=$(POD_NAME)
- name: JENKINS_OPTS
- name: JENKINS_SLAVE_AGENT_PORT
value: '50000'
- name: ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: jenkins
key: jenkins-admin-password
- name: ADMIN_USER
valueFrom:
secretKeyRef:
name: jenkins
key: jenkins-admin-user
- name: CASC_JENKINS_CONFIG
value: /var/jenkins_home/casc_configs
resources:
limits:
cpu: '2'
memory: 4Gi
requests:
cpu: 50m
memory: 256Mi
volumeMounts:
- name: tmp
mountPath: /tmp
- name: jenkins-home
mountPath: /var/jenkins_home
- name: jenkins-config
readOnly: true
mountPath: /var/jenkins_config
- name: secrets-dir
mountPath: /usr/share/jenkins/ref/secrets/
- name: plugin-dir
mountPath: /usr/share/jenkins/ref/plugins/
- name: sc-config-volume
mountPath: /var/jenkins_home/casc_configs
- name: dockersock
mountPath: /var/run/docker.sock
- name: docker
mountPath: /usr/bin/docker
livenessProbe:
httpGet:
path: /login
port: http
scheme: HTTP
initialDelaySeconds: 90
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /login
port: http
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
- name: jenkins-sc-config
image: 'kiwigrid/k8s-sidecar:0.1.144'
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: LABEL
value: jenkins-jenkins-config
- name: FOLDER
value: /var/jenkins_home/casc_configs
- name: NAMESPACE
value: infrastructure
- name: REQ_URL
value: >-
http://localhost:8080/reload-configuration-as-code/?casc-reload-token=$(POD_NAME)
- name: REQ_METHOD
value: POST
- name: REQ_RETRY_CONNECT
value: '10'
resources: {}
volumeMounts:
- name: sc-config-volume
mountPath: /var/jenkins_home/casc_configs
- name: jenkins-home
mountPath: /var/jenkins_home
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: jenkins
serviceAccount: jenkins
securityContext:
runAsUser: 0
fsGroup: 976
schedulerName: default-scheduler
strategy:
type: Recreate
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 38
replicas: 1
updatedReplicas: 1
readyReplicas: 1
availableReplicas: 1
conditions:
- type: Progressing
status: 'True'
lastUpdateTime: '2020-08-17T14:45:20Z'
lastTransitionTime: '2020-08-17T14:45:20Z'
reason: NewReplicaSetAvailable
message: ReplicaSet "jenkins-7454db64f6" has successfully progressed.
- type: Available
status: 'True'
lastUpdateTime: '2020-08-18T16:14:00Z'
lastTransitionTime: '2020-08-18T16:14:00Z'
reason: MinimumReplicasAvailable
message: Deployment has minimum availability.
这是主 pod 中日志输出的一部分:
2020-08-21 16:44:40.381+0000 [id=955] WARNING i.f.k.c.d.i.WatchConnectionManager$1#onFailure: Exec Failure
java.util.concurrent.RejectedExecutionException: Task okhttp3.RealCall$AsyncCall@2fb3e877 rejected from java.util.concurrent.ThreadPoolExecutor@9ce8b47[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 18]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at okhttp3.RealCall$AsyncCall.executeOn(RealCall.java:183)
Caused: java.io.InterruptedIOException: executor rejected
at okhttp3.RealCall$AsyncCall.executeOn(RealCall.java:186)
at okhttp3.Dispatcher.promoteAndExecute(Dispatcher.java:186)
at okhttp3.Dispatcher.enqueue(Dispatcher.java:137)
at okhttp3.RealCall.enqueue(RealCall.java:127)
at okhttp3.internal.ws.RealWebSocket.connect(RealWebSocket.java:193)
at okhttp3.OkHttpClient.newWebSocket(OkHttpClient.java:435)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.runWatch(WatchConnectionManager.java:158)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$1200(WatchConnectionManager.java:50)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2$1.execute(WatchConnectionManager.java:321)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$NamedRunnable.run(WatchConnectionManager.java:410)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-08-21 16:44:45.239+0000 [id=33] INFO hudson.slaves.NodeProvisioner#lambda$update$6: default-3393d provisioning successfully completed. We have now 3 computer(s)
2020-08-21 16:44:45.241+0000 [id=2765] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: infrastructure/default-3393d
2020-08-21 16:44:45.302+0000 [id=2826] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-08-21 16:44:45.350+0000 [id=2765] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-08-21 16:44:55.363+0000 [id=2765] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: default-3393d, template=PodTemplate{inheritFrom='', name='default', namespace='', hostNetwork=false, activeDeadlineSeconds=10, label='jenkins-jenkins-slave ', serviceAccount='default', nodeSelector='', nodeUsageMode=NORMAL, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], containers=[ContainerTemplate{name='jnlp', image='jenkins/jnlp-slave:3.27-1', workingDir='/home/jenkins', command='/bin/sh -c', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='512m', resourceRequestMemory='512Mi', resourceLimitCpu='512m', resourceLimitMemory='512Mi', envVars=[ContainerEnvVar [getValue()=http://jenkins.infrastructure.svc.cluster.local:8080, getKey()=JENKINS_URL]], livenessProbe=org.csanchez.jenkins.plugins.kubernetes.ContainerLivenessProbe@5187faf3}]}
java.lang.IllegalStateException: Pod has terminated containers: infrastructure/default-3393d (jnlp)
at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:133)
at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:154)
at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.await(AllContainersRunningPodWatcher.java:94)
at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:140)
at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:296)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-08-21 16:44:55.363+0000 [id=2765] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent default-3393d
Terminated Kubernetes instance for agent infrastructure/default-3393d
Disconnected computer default-3393d
2020-08-21 16:44:55.383+0000 [id=2765] INFO o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent infrastructure/default-3393d
2020-08-21 16:44:55.383+0000 [id=2765] INFO o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer default-3393d
2020-08-21 16:45:05.198+0000 [id=42] INFO o.c.j.p.k.KubernetesCloud#provision: Excess workload after pending Kubernetes agents: 1
2020-08-21 16:45:05.198+0000 [id=42] INFO o.c.j.p.k.KubernetesCloud#provision: Template for label null: default
2020-08-21 16:45:12.383+0000 [id=955] WARNING i.f.k.c.d.i.WatchConnectionManager$1#onFailure: Exec Failure
java.util.concurrent.RejectedExecutionException: Task okhttp3.RealCall$AsyncCall@6c6c7a45 rejected from java.util.concurrent.ThreadPoolExecutor@9ce8b47[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 18]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at okhttp3.RealCall$AsyncCall.executeOn(RealCall.java:183)
Caused: java.io.InterruptedIOException: executor rejected
at okhttp3.RealCall$AsyncCall.executeOn(RealCall.java:186)
at okhttp3.Dispatcher.promoteAndExecute(Dispatcher.java:186)
at okhttp3.Dispatcher.enqueue(Dispatcher.java:137)
at okhttp3.RealCall.enqueue(RealCall.java:127)
at okhttp3.internal.ws.RealWebSocket.connect(RealWebSocket.java:193)
at okhttp3.OkHttpClient.newWebSocket(OkHttpClient.java:435)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.runWatch(WatchConnectionManager.java:158)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$1200(WatchConnectionManager.java:50)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2$1.execute(WatchConnectionManager.java:321)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$NamedRunnable.run(WatchConnectionManager.java:410)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-08-21 16:45:15.236+0000 [id=2765] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: infrastructure/default-03q6x
2020-08-21 16:45:15.252+0000 [id=36] INFO hudson.slaves.NodeProvisioner#lambda$update$6: default-03q6x provisioning successfully completed. We have now 3 computer(s)
2020-08-21 16:45:15.314+0000 [id=2824] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-08-21 16:45:15.381+0000 [id=2765] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-08-21 16:45:25.390+0000 [id=2765] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: default-03q6x, template=PodTemplate{inheritFrom='', name='default', namespace='', hostNetwork=false, activeDeadlineSeconds=10, label='jenkins-jenkins-slave ', serviceAccount='default', nodeSelector='', nodeUsageMode=NORMAL, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], containers=[ContainerTemplate{name='jnlp', image='jenkins/jnlp-slave:3.27-1', workingDir='/home/jenkins', command='/bin/sh -c', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='512m', resourceRequestMemory='512Mi', resourceLimitCpu='512m', resourceLimitMemory='512Mi', envVars=[ContainerEnvVar [getValue()=http://jenkins.infrastructure.svc.cluster.local:8080, getKey()=JENKINS_URL]], livenessProbe=org.csanchez.jenkins.plugins.kubernetes.ContainerLivenessProbe@5187faf3}]}
java.lang.IllegalStateException: Pod has terminated containers: infrastructure/default-03q6x (jnlp)
at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:133)
at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:154)
at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.await(AllContainersRunningPodWatcher.java:94)
at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:140)
at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:296)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-08-21 16:45:25.391+0000 [id=2765] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent default-03q6x
Terminated Kubernetes instance for agent infrastructure/default-03q6x
现在这是我的 Kubernetes 云模板快照:
这是 pod 模板配置:
解决方案
我建议很少做这样的改变
将所有内容留空
jenkins tunnel
。詹金斯会自动捡起它。如果你在 kubernetes 集群中部署了这个 jenkins 实例,那么请使用内部地址,
jenkins_url
就像http://jenkins.infrastructure.svc
我假设你的 jenkins 服务名称jenkins
是ClusterIP
推荐阅读
- c++ - 最大价值股票,实施细节
- javascript - 按键访问数组中的 JSON 元素
- android - 观察者类必须声明为抽象或实现抽象方法
- mongodb - 具有多个数组的 Mongo 聚合文档
- python - Pandas:按 A 列对数据进行分组,按 B 列的现有值过滤 A
- autodesk-forge - 如何按系统筛选模型
- office365 - 类型“Microsoft.OutlookServices.Event”上不存在属性“iCalUId”
- ios - iOS地图标注中的UISwitch无响应
- neo4j - Neo4J:列表数据类型转换
- python - 我如何检查数据集的模式是否相似