kubernetes - 使用 postStart 启动 netdata,但没有按预期工作
问题描述
我们想使用 netdata 来监控应用程序性能,但不更新每个应用程序映像,所以想使用 postStart 钩子来实现这一点。
我们可以使用 postStart 来回显一些日志,但是使用 poststart 无法通过以下配置成功启动 netdata:
image: 10.18.210.178:40080/k8s-deploy/netdata:test4
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- sleep 10; /usr/sbin/netdata -p 19999 -u ssdepg
但是 netdata 可以在没有 sleep 10 命令的情况下启动:
image: 10.18.210.178:40080/k8s-deploy/netdata:test4
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- /usr/sbin/netdata -p 19999 -u ssdepg
使用这两种配置,应用程序 POD 可以进入 RUNING 状态,唯一的区别是我们看不到第一种情况的 netdata 进程。
pod describe、netdata 日志或 k8s 日志没有任何错误指示。
任何专家都可以给我们一些提示,为什么睡眠会导致这种情况。
需要澄清一下命令行使用没有任何问题,见netdata日志如下,它是由k8s通过postStart启动的,但没有成功(通过“ps”命令找不到进程)。
2019-07-18 08:44:59: netdata INFO : MAIN : Executing /usr/libexec/netdata/plugins.d/system-info.sh
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_OS_NAME="CentOS Linux"
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_OS_ID=centos
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_OS_ID_LIKE=rhel fedora
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_OS_VERSION=7 (Core)
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_OS_VERSION_ID=7
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_OS_DETECTION=/etc/os-release
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_KERNEL_NAME=Linux
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_KERNEL_VERSION=3.10.0-327.el7.x86_64
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_ARCHITECTURE=x86_64
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_VIRTUALIZATION=none
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_VIRT_DETECTION=systemd-detect-virt
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_CONTAINER=none
2019-07-18 08:44:59: netdata INFO : MAIN : NETDATA_SYSTEM_CONTAINER_DETECTION=systemd-detect-virt
2019-07-18 08:44:59: netdata INFO : MAIN : /usr/libexec/netdata/plugins.d/anonymous-statistics.sh 'START' '-' '-'
2019-07-18 08:45:01: netdata ERROR : MAIN : child pid 56 exited with code 28.
2019-07-18 08:45:01: netdata INFO : MAIN : resources control: allowed file descriptors: soft = 655360, max = 655360
2019-07-18 08:45:01: netdata INFO : MAIN : Out-Of-Memory (OOM) score is already set to the wanted value 999
2019-07-18 08:45:01: netdata INFO : MAIN : Adjusted netdata scheduling policy to idle (5), with priority 0.
2019-07-18 08:45:01: netdata INFO : MAIN : Running with process scheduling policy 'idle'
2019-07-18 08:45:01: netdata INFO : MAIN : netdata started on pid 83.
2019-07-18 08:45:01: netdata INFO : MAIN : CONFIG: cannot load user config '/etc/netdata/stream.conf'. Will try stock config.
2019-07-18 08:45:01: netdata INFO : MAIN : Host 'nginx-test-0717-1003812089-288d5' (at registry as 'nginx-test-0717-1003812089-288d5') with guid '54cb87fe-a938-11e9-8cc8-ca282c4f3765' initialized, os 'linux',
timezone 'UTC', tags '', program_name 'netdata', program_version 'v1.15.0', update every 5, memory mode save, history entries 924, streaming disabled (to '' with api key ''), health disabled, cache_dir '/var/ca
che/netdata', varlib_dir '/var/lib/netdata', health_log '/var/lib/netdata/health/health-log.db', alarms default handler '/usr/libexec/netdata/plugins.d/alarm-notify.sh', alarms default recipient 'root'
2019-07-18 08:45:01: netdata INFO : MAIN : SYSTEM_INFO: free 0x1057e90
2019-07-18 08:45:01: netdata INFO : PLUGIN[proc] : thread created with task id 84
2019-07-18 08:45:01: netdata INFO : STATSD : thread created with task id 85
2019-07-18 08:45:01: netdata INFO : BACKENDS : thread created with task id 86
2019-07-18 08:45:01: netdata INFO : WEB_SERVER[static1] : thread created with task id 87
2019-07-18 08:45:01: netdata INFO : MAIN : netdata initialization completed. Enjoy real-time performance monitoring!
2019-07-18 08:45:01: netdata INFO : HEALTH : thread created with task id 89
2019-07-18 08:45:01: netdata INFO : PLUGINSD : thread created with task id 88
2019-07-18 08:45:01: netdata INFO : PLUGINSD[apps] : thread created with task id 90
2019-07-18 08:45:01: netdata ERROR : PLUGINSD : cannot open plugins directory '/etc/netdata/custom-plugins.d' (errno 2, No such file or directory)
2019-07-18 08:45:01: netdata INFO : PLUGINSD[apps] : connected to '/usr/libexec/netdata/plugins.d/apps.plugin' running on pid 91
2019-07-18 08:45:01: netdata INFO : WEB_SERVER[static1] : 2019-07-18 08:45:01: apps.plugin ERROR : MAIN : PROCFILE: Cannot open file '/etc/netdata/apps_groups.conf' (errno 2, No such file or directory)
2019-07-18 08:45:01: apps.plugin INFO : MAIN : Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
2019-07-18 08:45:01: apps.plugin INFO : MAIN : Loaded config file '/usr/lib/netdata/conf.d/apps_groups.conf'
2019-07-18 08:45:01: apps.plugin INFO : MAIN : started on pid 91
解决方案
值得按照官方 K8s文档中的建议进行操作
image: 10.18.210.178:40080/k8s-deploy/netdata:test4
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "sleep 10; /usr/sbin/netdata -p 19999 -u ssdepg"
推荐阅读
- python - Gunicorn没有加载环境文件
- php - AWS SDK Promises 回调每个承诺
- python - 如何使用带条件的循环在数据框中添加行
- javascript - 如何决定在 React 中调用处理函数的最有效方式?
- markdown - Markdown 表无法在 GitHub 页面上正确呈现
- php - 验证数组中的重复字段
- pine-script - 绘制自定义数据 - 每日 = 正常,每周 = 不正常
- python - 使用下拉小部件,从一列中选择值,从数据框中的另一列返回值
- tensorflow - TypeError:“int”对象不可下标(imblearn 生成器)
- dataframe - Pyspark 获取每天的最后一个日期时间