首页 > 解决方案 > Docker 映像用完覆盖空间并导致 pod 驱逐 ok GKE

问题描述

我创建了一个 Docker 映像,它每分钟运行一个 java 命令(通过 ExecutorService)并使用 log 将其打印到屏幕上。

大多数情况下,输出如下所示: No requests found, sleeping for a minute

它在 Google Cloud Platform 上的 Kubernetes Engine 上运行。我遇到了一个问题,我的 pod 每 5 小时被驱逐一次,并出现以下错误:

The node was low on resource: ephemeral-storage. Container ingestion-pager-remediation was using 216Ki, which exceeds its request of 0.

这是什么原因?起初,我认为原因是未能关闭 Input/OutputStreams 和 HttpConnections。我浏览了代码并确保关闭所有连接,但随着时间的推移,大小仍然会增加。

当我查看磁盘使用情况时,我发现我的 /overlay 空间随着时间的推移而增加。下面是只运行一个 java 命令的“已用”空间。

/ # df -h
Filesystem                Size      Used Available Use% Mounted on
overlay                 291.2G    172.3G    118.8G  59% /
tmpfs                    64.0M         0     64.0M   0% /dev
tmpfs                   102.3G         0    102.3G   0% /sys/fs/cgroup
/dev/sda1               291.2G    172.3G    118.8G  59% /dev/termination-log
/dev/sda1               291.2G    172.3G    118.8G  59% /mount/javakeystore
/dev/sda1               291.2G    172.3G    118.8G  59% /mount/json
/dev/sda1               291.2G    172.3G    118.8G  59% /etc/resolv.conf
/dev/sda1               291.2G    172.3G    118.8G  59% /etc/hostname
/dev/sda1               291.2G    172.3G    118.8G  59% /etc/hosts
shm                      64.0M         0     64.0M   0% /dev/shm
tmpfs                   102.3G     12.0K    102.3G   0% /run/secrets/kubernetes.io/serviceaccount
tmpfs                   102.3G         0    102.3G   0% /proc/acpi
tmpfs                    64.0M         0     64.0M   0% /proc/kcore
tmpfs                    64.0M         0     64.0M   0% /proc/keys
tmpfs                    64.0M         0     64.0M   0% /proc/timer_list
tmpfs                   102.3G         0    102.3G   0% /proc/scsi
tmpfs                   102.3G         0    102.3G   0% /sys/firmware
/ # df -h
Filesystem                Size      Used Available Use% Mounted on
overlay                 291.2G    172.4G    118.8G  59% /
tmpfs                    64.0M         0     64.0M   0% /dev
tmpfs                   102.3G         0    102.3G   0% /sys/fs/cgroup
/dev/sda1               291.2G    172.4G    118.8G  59% /dev/termination-log
/dev/sda1               291.2G    172.4G    118.8G  59% /mount/javakeystore
/dev/sda1               291.2G    172.4G    118.8G  59% /mount/json
/dev/sda1               291.2G    172.4G    118.8G  59% /etc/resolv.conf
/dev/sda1               291.2G    172.4G    118.8G  59% /etc/hostname
/dev/sda1               291.2G    172.4G    118.8G  59% /etc/hosts
shm                      64.0M         0     64.0M   0% /dev/shm
tmpfs                   102.3G     12.0K    102.3G   0% /run/secrets/kubernetes.io/serviceaccount
tmpfs                   102.3G         0    102.3G   0% /proc/acpi
tmpfs                    64.0M         0     64.0M   0% /proc/kcore
tmpfs                    64.0M         0     64.0M   0% /proc/keys
tmpfs                    64.0M         0     64.0M   0% /proc/timer_list
tmpfs                   102.3G         0    102.3G   0% /proc/scsi
tmpfs                   102.3G         0    102.3G   0% /sys/firmware

这是 pod 上唯一运行的东西:

/ # ps -Af
PID   USER     TIME  COMMAND
    1 root      0:27 java -jar -Dlog4j.configurationFile=/ingestion-pager-remediation/log4j.properties -Dorg.slf4j.simpleLogger.defaultLogLevel=info /ingestion-pager-remediation/ingest-pa
  222 root      0:00 sh
  250 root      0:00 ps -Af

如前所述,这是一个简单的 java 命令,它运行几个 http 连接,而不是休眠。

有谁知道为什么我的覆盖空间会随着时间的推移增加到 300GB?

(编辑)

我只使用此调试配置登录到标准输出:

log4j.rootLogger=DEBUG, STDOUT
log4j.logger.deng=INFO
log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender
log4j.appender.STDOUT.layout=org.apache.log4j.PatternLayout
log4j.appender.STDOUT.layout.ConversionPattern=%5p [%t] (%F:%L) - %m%n
org.slf4j.simpleLogger.defaultLogLevel = info

标签: javadockerkubernetesgoogle-kubernetes-engine

解决方案


推荐阅读