首页 > 解决方案 > 纳吉奥斯 | 如果发现状态不佳,自动修复服务

问题描述

如果发现服务处于不良状态,是否可以在 nagios 中配置要运行的命令或脚本?

标签: nagios

解决方案


是的,这可以通过事件处理程序来实现。这是服务定义的示例:

define service {
    host_name               somehost
    service_description     HTTP
    max_check_attempts      4
    event_handler           restart-httpd
    ...
}

命令定义:

define command {
    command_name    restart-httpd
    command_line    /usr/local/nagios/libexec/eventhandlers/restart-httpd  $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$
}

这是restart-httpd脚本:

#!/bin/sh    
# What state is the HTTP service in?
case "$1" in
OK)
    # The service just came back up, so don't do anything...
    ;;
WARNING)
    # We don't really care about warning states, since the service is probably still running...
    ;;
UNKNOWN)
    # We don't know what might be causing an unknown error, so don't do anything...
    ;;
CRITICAL)
    # Aha!  The HTTP service appears to have a problem - perhaps we should restart the server...
    # Is this a "soft" or a "hard" state?
    case "$2" in

    # We're in a "soft" state, meaning that Nagios is in the middle of retrying the
    # check before it turns into a "hard" state and contacts get notified...
    SOFT)

        # What check attempt are we on?  We don't want to restart the web server on the first
        # check, because it may just be a fluke!
        case "$3" in

        # Wait until the check has been tried 3 times before restarting the web server.
        # If the check fails on the 4th time (after we restart the web server), the state
        # type will turn to "hard" and contacts will be notified of the problem.
        # Hopefully this will restart the web server successfully, so the 4th check will
        # result in a "soft" recovery.  If that happens no one gets notified because we
        # fixed the problem!
        3)
            echo -n "Restarting HTTP service (3rd soft critical state)..."
            # Call the init script to restart the HTTPD server
            /etc/rc.d/init.d/httpd restart
            ;;
            esac
        ;;

    # The HTTP service somehow managed to turn into a hard error without getting fixed.
    # It should have been restarted by the code above, but for some reason it didn't.
    # Let's give it one last try, shall we?  
    # Note: Contacts have already been notified of a problem with the service at this
    # point (unless you disabled notifications for this service)
    HARD)
        echo -n "Restarting HTTP service..."
        # Call the init script to restart the HTTPD server
        /etc/rc.d/init.d/httpd restart
        ;;
    esac
    ;;
esac
exit 0

推荐阅读