首页 > 解决方案 > 当脚本作为 crontab 作业运行时,某些程序不是由 bash 脚本启动,但所有程序都可以通过手动运行 bash 脚本来启动

问题描述

问题

我有一些类似的 bash 脚本来启动大约一百个进程,这些 bash 脚本每天在 09:12 运行一次,它们在以前进程不多的时候运行良好,但是随着最近几天由脚本启动的进程增加,我发现某些进程未启动。我手动运行了脚本来检查发生了什么,但是所有由 bash 脚本启动的进程都是手动运行的。

其中一些未启动的进程离开核心转储,它显示程序在调用 std::thread 创建线程时以信号 SIGABRT 终止。

该系统是 ubuntu server 18.04.4 LTS,crontab 并以普通用户身份手动运行。

我试过的

我将脚本分成两部分,分别启动不同部分的进程,一个 crontab 在 09:12 另一个在 09:25。但不幸的是,第一个脚本无法启动部分进程,第二个脚本无法启动任何进程。

我检查了是否有东西达到系统限制,没有达到限制。

代码

bash 脚本“Signal_IF_trend.sh”如下

#!/bin/bash
## argument not given
if [ $# -eq 0 ];
then
        echo "argument:start - to start all processes, stop - to stop all processes"
        exit 0
fi

arr_IFtrend=(
    "2927" "2932" "2935" "2936" "2937" "2938" "3125" "3127" "3128" "3129" "3130" "3329" "3330" "3331"
    "3332" "3333" "3334" "3725" "3726" "3727" "3728" "3729" "3925" "3927" "3928" "3929" "3930" "4124"
    "4125" "4126" "4127" "4128" "4320" "4321" "4322" "4323" "4521" "4522" "4523" "4524" "4525" "4718"
    "4719" "4720" "4721" "4729" "5117" "5121" "5122" "5123" "5317" "5318" "5319" "5322" "5323" "5518"
    "5519" "5523" "5524" "5916" "5917" "5918" "5919" "6112" "6117" "6126" "6317" "6321" "6322" "6323"
    "6324" "6712" "6717" "6718" "6719" "6720" "6721" "7111" "7112" "7113" "7114" "7115" "7116" "7312"
    "7313" "7314" "7315" "7316" "7611" "7612" "7613" "7614" "7615" "7616" "7914" "7915" "7916" "7917"
    "8113" "8114" "8115" "8116" "8117" "8118" "8119"
)
ver_trend="v064"

ulimit -c unlimited

BASEPATH=$(cd `dirname $0`;pwd)

case "$1" in

    start)
    count=0
        for var in ${arr_IFtrend[@]}
        do
                count=$(($count+1))
                if(($count % 10 == 0))
                then
                        /bin/sleep 5
                fi
                cd $BASEPATH/Signal_IF/IFtrend_NTick_Del00/IF$var
                nohup ./IFtrend_NTickDel00_${ver_trend}_$var >/dev/null 2>&1 &
                echo $! > run.pid
                echo "Signal_IF/IFtrend_NTick_Del00/IF$var/IFtrend_NTickDel00_${ver_trend}_$var started"
        done
        ;;


    stop)
        for var in ${arr_IFtrend[@]}
        do
                cd $BASEPATH/Signal_IF/IFtrend_NTick_Del00/IF$var
                kill `cat run.pid`
                rm -rf run.pid
                echo "Signal_IF/IFtrend_NTick_Del00/IF$var/IFtrend_NTickDel00_${ver_trend}_$var stoping..."
        done
        ;;

    zip)
        for var in ${arr_IFtrend[@]}
        do
                cd $BASEPATH/Signal_IF/IFtrend_NTick_Del00/IF$var
                gzip *.log
                find . -mtime +7 -name "*.gz" -exec rm -rf {} \;
        done
        ;;
esac

exit 0

用户 crontab

# Edit this file to introduce tasks to be run by cron.
# 
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
# 
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').# 
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
# 
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
# 
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
# 
# For more information see the manual pages of crontab(5) and cron(8)
# 
# m h  dom mon dow   command

# IC IF T ##########################
# ----------------------------------
# start processes
0 9     * * 1-5 /home/xxsc/ProductionEnv/Signal_IC.sh start > /home/xxsc/ProductionEnv/Signal_IC_crontab.log
5 9     * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_deviation.sh start > /home/xxsc/ProductionEnv/Signal_IF_devication_crontab.log
10 9    * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_KNATR.sh start > /home/xxsc/ProductionEnv/Signal_IF_KNATR_crontab.log
12 9    * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_trend.sh start > /home/xxsc/ProductionEnv/Signal_IF_trend_crontab.log
# 16 9    * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_NTime.sh start > /home/xxsc/ProductionEnv/Signal_IF_NTime_crontab.log
25 9    * * 1-5 /home/xxsc/ProductionEnv/Signal_option.sh start > /home/xxsc/ProductionEnv/Signal_option_crontab.log
# stop processes
30 15   * * 1-5 /home/xxsc/ProductionEnv/Signal_IC.sh stop
33 15   * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_deviation.sh stop
33 15   * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_KNATR.sh stop
# 35 15 * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_NTime.sh stop
36 15   * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_trend.sh stop
39 15   * * 1-5 /home/xxsc/ProductionEnv/Signal_option.sh stop
# zip logs
30 16   * * 1-5 /home/xxsc/ProductionEnv/Signal_IC.sh zip
35 16   * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_deviation.sh zip
37 16   * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_KNATR.sh zip
# 40 16   * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_NTime.sh zip
45 16   * * 1-5 /home/xxsc/ProductionEnv/Signal_IF_trend.sh zip
50 16   * * 1-5 /home/xxsc/ProductionEnv/Signal_Comodity.sh zip
55 16   * * 1-5 /home/xxsc/ProductionEnv/Signal_option.sh zip

gdb 显示核心转储(安装 glibc 源后更新)

faund@Sirius:~/debug$ gdb IFtrend_NTickDel00_v064_8118 core 
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from IFtrend_NTickDel00_v064_8118...done.

warning: core file may not match specified executable file.
[New LWP 25832]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./IFtrend_NTickDel00_v064_8118'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) info source
Current source file is ../sysdeps/unix/sysv/linux/raise.c
Compilation directory is /build/glibc-2ORdQG/glibc-2.27/signal
Source language is c.
Producer is GNU C11 7.5.0 -mtune=generic -march=x86-64 -g -O2 -O3 -std=gnu11 -fgnu89-inline -fmerge-all-constants -frounding-math -fstack-protector-strong -fPIC -ftls-model=initial-exec -fstack-protector-strong.
Compiled with DWARF 2 debugging format.
Does not include preprocessor macro info.
(gdb) set substitute-path /build/glibc-2ORdQG/glibc-2.27 /opt/src/glibc-2.27
(gdb) frame 0
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51  }
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f9aac6d08b1 in __GI_abort () at abort.c:79
#2  0x00007f9aad0c3957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f9aad0c9ae6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f9aad0c9b21 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f9aad0c9d54 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f9aad0c5a23 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f9aad0f49a9 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x0000559c6fcc5319 in std::thread::thread<void (MfLogger::*)(), MfLogger*> (__f=<optimized out>, this=0x7ffc745ecc80) at /usr/include/c++/7/thread:126
#9  MfLogger::MfLogger (this=0x7ffc745ed670, logFileName=...) at /home/siko/Documents/V64/IF_Drange_trend/SharedDataStructs/MfLogger.cpp:13
#10 0x0000559c6fcc36cb in main () at /home/siko/Documents/V64/IF_Drange_trend/Strategy/main.cpp:59
(gdb) 

用户收到邮件

Subject: Cron <xxsc@Strategy2> /home/xxsc/ProductionEnv/Signal_IF_trend.sh start > /home/xxsc/ProductionEnv/Signal_IF_trend_crontab.log
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/home/xxsc>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=xxsc>
Message-Id: <20200902011301.AA2C0B0040A@Strategy2>
Date: Wed,  2 Sep 2020 09:13:01 +0800 (CST)

/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10014 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF6324)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10015 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF6712)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10016 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF6717)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10017 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF6718)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10019 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF6720)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10022 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10069 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7113)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10070 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7114)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10071 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7115)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10072 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7116)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10073 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7312)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10074 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7313)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10075 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7314)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10076 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7315)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10077 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7316)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10078 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10104 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7612)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10105 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7613)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10106 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7614)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10107 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7615)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10108 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7616)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10109 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7914)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10110 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7915)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10111 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7916)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10112 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1  (wd: ~/ProductionEnv/Signal_IF/IFtrend_NTick_Del00/IF7917)
/home/xxsc/ProductionEnv/Signal_IF_trend.sh: line 30: 10113 Aborted                 (core dumped) nohup ./IFtrend_NTickDel00_${ver_trend}_$var > /dev/null 2>&1

标签: linuxbash

解决方案


最后,这个问题解决了。关键是 TasksMax 对 cron.service 的设置限制了脚本创建更多进程。

Ubuntu 服务器 18.04 的分辨率

编辑 /etc/systemd 中的 system.conf 文件

sudo nano /etc/systemd/system.conf

将以下行添加到 system.conf 的末尾

DefaultTasksMax=100000

重新启动服务器

如何检查

命令“systemctl status cron”可以显示以下信息

在修改 system.conf 之前

xxsc@Strategy2:/etc/systemd$ sudo systemctl service cron
Unknown operation service.
xxsc@Strategy2:/etc/systemd$ sudo systemctl status cron
¡ñ cron.service - Regular background program processing daemon
   Loaded: loaded (/lib/systemd/system/cron.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2020-08-29 15:26:17 CST; 3 days ago
     Docs: man:cron(8)
 Main PID: 966 (cron)
    Tasks: 4743 (limit: 4915)
   CGroup: /system.slice/cron.service
           ©À©€  966 /usr/sbin/cron -f
           ©À©€ 4322 ./prodTickRecorderv057.15
           ©À©€ 4355 ./Comodity_AP_NTime_Del00_v0533_0212
           ©À©€ 4356 ./Comodity_AP_NTime_Del00_v0533_0306
           ...

注意行 Tasks: 4743 (limit: 4915), Tasks is close to limit。修改system.conf文件后

xxsc@Strategy2:~$ systemctl status cron
¡ñ cron.service - Regular background program processing daemon
   Loaded: loaded (/lib/systemd/system/cron.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2020-09-02 15:47:46 CST; 2min 26s ago
     Docs: man:cron(8)
 Main PID: 1069 (cron)
    Tasks: 1 (limit: 100000)
   CGroup: /system.slice/cron.service
           ©ž©€1069 /usr/sbin/cron -f
           ...

推荐阅读