首页 > 解决方案 > 使用 postStart 启动 netdata,但没有按预期工作

问题描述

我们想使用 netdata 来监控应用程序性能,但不更新每个应用程序映像,所以想使用 postStart 钩子来实现这一点。

我们可以使用 postStart 来回显一些日志,但是使用 poststart 无法通过以下配置成功启动 netdata:

    image: 10.18.210.178:40080/k8s-deploy/netdata:test4
    imagePullPolicy: IfNotPresent
    lifecycle:
      postStart:
        exec:
          command:
          - /bin/sh
          - -c
          - sleep 10; /usr/sbin/netdata -p 19999 -u ssdepg

但是 netdata 可以在没有 sleep 10 命令的情况下启动:

    image: 10.18.210.178:40080/k8s-deploy/netdata:test4
    imagePullPolicy: IfNotPresent
    lifecycle:
      postStart:
        exec:
          command:
          - /bin/sh
          - -c
          - /usr/sbin/netdata -p 19999 -u ssdepg

使用这两种配置,应用程序 POD 可以进入 RUNING 状态,唯一的区别是我们看不到第一种情况的 netdata 进程。

pod describe、netdata 日志或 k8s 日志没有任何错误指示。

任何专家都可以给我们一些提示,为什么睡眠会导致这种情况。

需要澄清一下命令行使用没有任何问题,见netdata日志如下,它是由k8s通过postStart启动的,但没有成功(通过“ps”命令找不到进程)。

2019-07-18 08:44:59: netdata INFO  : MAIN : Executing /usr/libexec/netdata/plugins.d/system-info.sh
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_OS_NAME="CentOS Linux"
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_OS_ID=centos
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_OS_ID_LIKE=rhel fedora
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_OS_VERSION=7 (Core)
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_OS_VERSION_ID=7
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_OS_DETECTION=/etc/os-release
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_KERNEL_NAME=Linux
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_KERNEL_VERSION=3.10.0-327.el7.x86_64
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_ARCHITECTURE=x86_64
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_VIRTUALIZATION=none
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_VIRT_DETECTION=systemd-detect-virt
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_CONTAINER=none
2019-07-18 08:44:59: netdata INFO  : MAIN : NETDATA_SYSTEM_CONTAINER_DETECTION=systemd-detect-virt
2019-07-18 08:44:59: netdata INFO  : MAIN : /usr/libexec/netdata/plugins.d/anonymous-statistics.sh 'START' '-' '-'
2019-07-18 08:45:01: netdata ERROR : MAIN : child pid 56 exited with code 28.
2019-07-18 08:45:01: netdata INFO  : MAIN : resources control: allowed file descriptors: soft = 655360, max = 655360
2019-07-18 08:45:01: netdata INFO  : MAIN : Out-Of-Memory (OOM) score is already set to the wanted value 999
2019-07-18 08:45:01: netdata INFO  : MAIN : Adjusted netdata scheduling policy to idle (5), with priority 0.
2019-07-18 08:45:01: netdata INFO  : MAIN : Running with process scheduling policy 'idle'
2019-07-18 08:45:01: netdata INFO  : MAIN : netdata started on pid 83.
2019-07-18 08:45:01: netdata INFO  : MAIN : CONFIG: cannot load user config '/etc/netdata/stream.conf'. Will try stock config.
2019-07-18 08:45:01: netdata INFO  : MAIN : Host 'nginx-test-0717-1003812089-288d5' (at registry as 'nginx-test-0717-1003812089-288d5') with guid '54cb87fe-a938-11e9-8cc8-ca282c4f3765' initialized, os 'linux', 
timezone 'UTC', tags '', program_name 'netdata', program_version 'v1.15.0', update every 5, memory mode save, history entries 924, streaming disabled (to '' with api key ''), health disabled, cache_dir '/var/ca
che/netdata', varlib_dir '/var/lib/netdata', health_log '/var/lib/netdata/health/health-log.db', alarms default handler '/usr/libexec/netdata/plugins.d/alarm-notify.sh', alarms default recipient 'root'
2019-07-18 08:45:01: netdata INFO  : MAIN : SYSTEM_INFO: free 0x1057e90
2019-07-18 08:45:01: netdata INFO  : PLUGIN[proc] : thread created with task id 84
2019-07-18 08:45:01: netdata INFO  : STATSD : thread created with task id 85
2019-07-18 08:45:01: netdata INFO  : BACKENDS : thread created with task id 86
2019-07-18 08:45:01: netdata INFO  : WEB_SERVER[static1] : thread created with task id 87
2019-07-18 08:45:01: netdata INFO  : MAIN : netdata initialization completed. Enjoy real-time performance monitoring!
2019-07-18 08:45:01: netdata INFO  : HEALTH : thread created with task id 89
2019-07-18 08:45:01: netdata INFO  : PLUGINSD : thread created with task id 88
2019-07-18 08:45:01: netdata INFO  : PLUGINSD[apps] : thread created with task id 90
2019-07-18 08:45:01: netdata ERROR : PLUGINSD : cannot open plugins directory '/etc/netdata/custom-plugins.d' (errno 2, No such file or directory)
2019-07-18 08:45:01: netdata INFO  : PLUGINSD[apps] : connected to '/usr/libexec/netdata/plugins.d/apps.plugin' running on pid 91
2019-07-18 08:45:01: netdata INFO  : WEB_SERVER[static1] : 2019-07-18 08:45:01: apps.plugin ERROR : MAIN : PROCFILE: Cannot open file '/etc/netdata/apps_groups.conf' (errno 2, No such file or directory)
2019-07-18 08:45:01: apps.plugin INFO  : MAIN : Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
2019-07-18 08:45:01: apps.plugin INFO  : MAIN : Loaded config file '/usr/lib/netdata/conf.d/apps_groups.conf'
2019-07-18 08:45:01: apps.plugin INFO  : MAIN : started on pid 91

标签: kubernetes

解决方案


值得按照官方 K8s文档中的建议进行操作

image: 10.18.210.178:40080/k8s-deploy/netdata:test4
imagePullPolicy: IfNotPresent
lifecycle:
  postStart:
    exec:
      command: ["/bin/sh", "-c", "sleep 10; /usr/sbin/netdata -p 19999 -u ssdepg"

推荐阅读