首页 > 解决方案 > redis sentinel 未将 SDOWN 升级为 ODOWN 事件

问题描述

需要帮助以了解出了什么问题

我已经在 kubernetes 环境中部署了 redis,我有 1 个 master 2 个 slave 和 3 个 sentinel。我正在使用 redis 6.2.3 alpine 图像。所有 redis/sentinel 在单独的 pod 中运行。

NAME                READY   STATUS    RESTARTS   AGE   IP              NODE   NOMINATED NODE   READINESS GATES
redis-0             1/1     Running   0          31m   10.233.64.143   vm1    <none>           <none>
redis-1             1/1     Running   0          34m   10.233.64.90    vm1    <none>           <none>
redis-2             1/1     Running   0          34m   10.233.64.40    vm1    <none>           <none>
sentinel-0          1/1     Running   0          34m   10.233.64.93    vm1    <none>           <none>
sentinel-1          1/1     Running   0          34m   10.233.64.35    vm1    <none>           <none>
sentinel-2          1/1     Running   0          34m   10.233.64.34    vm1    <none>           <none>

此外,我还为 redis 和 sentinel pod 编写了无头服务,使用它我可以联系到服务后面的特定 pod。

    [root@master-1 ~]# kubectl describe svc sentinel -n ankit
    Name:              sentinel
    Namespace:         ankit
    Labels:            <none>
    Annotations:       <none>
    Selector:          app=sentinel
    Type:              ClusterIP
    IP:                None
    Port:              sentinel  5000/TCP
    TargetPort:        5000/TCP
    Endpoints:         10.233.64.34:5000,10.233.64.35:5000,10.233.64.93:5000
    Session Affinity:  None

[root@master-1 ~]# kubectl describe svc redis -n ankit
Name:              redis
Namespace:         ankit
Labels:            <none>
Annotations:       <none>
Selector:          app=redis
Type:              ClusterIP
IP:                None
Port:              redis  6379/TCP
TargetPort:        6379/TCP
Endpoints:         10.233.64.143:6379,10.233.64.40:6379,10.233.64.90:6379
Session Affinity:  None
Events:            <none>
[root@master-1 ~]#

当部署 redis statefulset pod 时,我在 redis yaml 的 init 容器中编写了一个逻辑,以使 redis-0 pod 默认为 master。我可以看到所有 pod 都已启动并完美运行,所有 thress 哨兵也能够与 master 和其他哨兵连接,但是当我删除 redis 主 pod 时,所有三个哨兵都记录了 SDOWN 事件,但它没有升级为 ODOWN 事件,因此没有发生故障转移,并且当 redis-0 作为从属服务器出现时,哨兵无法选择新的主服务器,由于没有主服务器,集群处于错误状态。

哨兵-0 输出

redis master删除后的sentinel-0日志:

1:X 15 Oct 2021 02:13:52.155 * +fix-slave-config slave 10.233.64.40:6379 10.233.64.40 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379
1:X 15 Oct 2021 02:13:52.322 * +fix-slave-config slave 10.233.64.90:6379 10.233.64.90 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379
1:X 15 Oct 2021 02:13:53.194 * +fix-slave-config slave 10.233.64.40:6379 10.233.64.40 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379
1:X 15 Oct 2021 02:13:53.338 * +fix-slave-config slave 10.233.64.90:6379 10.233.64.90 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379
1:X 15 Oct 2021 02:13:54.203 * +fix-slave-config slave 10.233.64.40:6379 10.233.64.40 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379
1:X 15 Oct 2021 02:13:54.399 * +fix-slave-config slave 10.233.64.90:6379 10.233.64.90 6379 @ mymaster redis-0.redis.ankit.svc.cluster.local 6379
1:X 15 Oct 2021 02:13:54.635 # +sdown master mymaster redis-0.redis.ankit.svc.cluster.local 6379
1:X 15 Oct 2021 02:14:00.040 - Accepted 10.233.64.143:33288
1:X 15 Oct 2021 02:14:00.047 - Client closed connection

删除主 redis pod 后的 Sentinel-1 日志

1:X 15 Oct 2021 02:11:10.200 . Rewritten config file (/etc/redis/sentinel.conf) successfully
1:X 15 Oct 2021 02:13:54.600 # +sdown master mymaster redis-0.redis.ankit.svc.cluster.local 6379
1:X 15 Oct 2021 02:14:00.054 - Accepted 10.233.64.143:48550
1:X 15 Oct 2021 02:14:00.055 - Client closed connection

删除主 redis pod 后的 Sentinel-2 日志

1:X 15 Oct 2021 02:11:09.858 . Rewritten config file (/etc/redis/sentinel.conf) successfully
1:X 15 Oct 2021 02:11:10.244 - Accepted 10.233.64.93:35181
1:X 15 Oct 2021 02:11:10.264 - Accepted 10.233.64.35:56403
1:X 15 Oct 2021 02:13:54.636 # +sdown master mymaster redis-0.redis.ankit.svc.cluster.local 6379

正如我们所看到的,它没有升级为 ODOWN 事件,因此也没有发生进一步的故障转移。

附加redis和sentinel conf文件

Redis 配置文件:

masterauth password
requirepass password
bind 0.0.0.0
protected-mode no
port 6379
tcp-backlog 511
# Close the connection after a client is idle for N seconds (0 to disable)
timeout 0
tcp-keepalive 300
daemonize no
supervised no
pidfile "/var/run/redis_6379.pid"
loglevel debug
logfile ""
databases 16
always-show-logo yes
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump.rdb"
rdb-del-sync-files no
dir "/data"
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-diskless-load disabled
repl-disable-tcp-nodelay no
replica-priority 100
acllog-max-len 128
maxclients 9000

lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no

lazyfree-lazy-user-del no
appendonly yes
appendfilename "appendonly.aof"

appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2

list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000

stream-node-max-bytes 4kb
stream-node-max-entries 100

activerehashing yes

client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes

# Jemalloc background thread for purging will be enabled by default
jemalloc-bg-thread yes

slaveof redis-0.redis.ankit.svc.cluster.local 6379

哨兵配置文件:

port 5000
daemonize no
protected-mode no
bind 0.0.0.0
acllog-max-len 128
sentinel deny-scripts-reconfig yes
sentinel resolve-hostnames yes
sentinel announce-hostnames yes
sentinel monitor mymaster redis-0.redis.ankit.svc.cluster.local 6379 2
sentinel down-after-milliseconds mymaster 4000
sentinel failover-timeout mymaster 2000

sentinel auth-pass mymaster password
maxclients 9000
loglevel debug

# Generated by CONFIG REWRITE
user default on nopass ~* &* +@all
dir "/data"
sentinel myid c7d1f666d94b7ab0a05701c83ccd1246d2628ca1
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
sentinel current-epoch 0
sentinel known-replica mymaster 10.233.64.90 6379
sentinel known-replica mymaster 10.233.64.40 6379
sentinel known-sentinel mymaster 10.233.64.34 5000 6e5e0ecf8551c21b543815c966a19a54809677c4
sentinel known-sentinel mymaster 10.233.64.35 5000 3f4493c38c5514d76f2eb698aed9c0b6ba550be9

标签: redis-clusterredis-sentinel

解决方案


推荐阅读