首页 > 解决方案 > 尝试从 docker swarm 中的 kafka 代理捕获数据时失败

问题描述

我正在使用 docker swarm 的基础架构中工作,其中每个节点有一个(并且只有一个)kafka,一切正常,没有任何错误/警告日志,但是当我尝试访问第二个代理(worker2)时,我发现了错误。看看输出:

kafkacat -L -b worker2:9094                                                                                                                     
% ERROR: Failed to acquire metadata: Local: Timed out

预期的输出worker1是:

kafkacat -L -b worker1:9094                                                                                                                      
Metadata for all topics (from broker 1001: worker1:9094/1001):
 2 brokers:
  broker 1001 at worker1:9094
  broker 1002 at worker2:9094
 1 topics:
  topic "logging_application_access" with 1 partitions:
    partition 0, leader 1001, replicas: 1001, isrs: 1001

我的列表节点的输出是:

ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
c6msb82zav2p1lepd13phmeei *   manager1            Ready               Active              Leader              18.06.1-ce
n3sfqz4rgewtulz43q5qobmr1     worker1             Ready               Active                                  18.06.1-ce
xgkibsp0kx29bhmjkwysapa6h     worker2             Ready               Active                                  18.06.1-ce  

为了更好地理解,看看我的docker-compose.yml文件:

version: '3.6'

x-proxy: &proxy
  http_proxy: ${http_proxy}
  https_proxy: ${https_proxy}
  no_proxy: ${no_proxy}

services:
  zookeeper:
    image: zookeeper:3.4.13
    hostname: zookeeper
    volumes:
      - type: volume
        source: zookeeper-data
        target: /data
    environment:
      <<: *proxy
    ports:
      - target: 2181
        published: 2181
        protocol: tcp
    networks:
      - workshop
    restart: always
    deploy:
      replicas: 2
      placement:
        constraints:
          - node.role == worker

  kafka:
    image: wurstmeister/kafka:2.12-2.1.0
    hostname: kafka
    volumes:
      - type: volume
        source: kafka-data
        target: /kafka
      - type: bind
        source: /var/run/docker.sock
        target: /var/run/docker.sock
        read_only: true
    env_file: ./services/kafka/.env
    environment:
      <<: *proxy
    ports:
      - target: 9094
        published: 9094
        protocol: tcp
        mode: host
    networks:
      - workshop
    restart: always
    depends_on:
      - zookeeper
    deploy:
      mode: global
      placement:
        constraints:
          - node.role == worker

volumes:
  zookeeper-data:
    driver: local
  kafka-data:
    driver: local

networks:
  workshop:
    name: workshop
    external: true

最后是环境文件:

HOSTNAME_COMMAND=docker info | grep ^Name: | cut -d ' ' -f 2

KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS=INSIDE://:9092,OUTSIDE://_{HOSTNAME_COMMAND}:9094
KAFKA_LISTENERS=INSIDE://:9092,OUTSIDE://:9094
KAFKA_INTER_BROKER_LISTENER_NAME=INSIDE

KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS=36000
KAFKA_ZOOKEEPER_SESSION_TIMEOUT_MS=36000

KAFKA_LOG_RETENTION_HOURS=24
KAFKA_AUTO_CREATE_TOPICS_ENABLE=false
KAFKA_CREATE_TOPICS=logging_application_access:1:1

KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote.rmi.port=1099
JMX_PORT=1099

我正在为此研究解决方案,目前没有任何成功。

标签: dockerapache-kafkadocker-swarm

解决方案


如果将 zookeeper 缩小到一个实例,它会开始工作吗?根据我的经验,如果你想在高可用性模式下运行 zookeeper,你需要在 zookeeper 连接字符串中明确列出整个 quorum,这对于 docker 中的复制服务效果不佳。所以要么只运行一个zookeeper节点,要么在你的compose文件中为每个节点(即zookeeper1 zookeeper2,zookeeper3)创建单独的服务,并将它们全部列在zookeeper连接变量中,即KAFKA_ZOOKEEPER_CONNECT=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181 .

您可以尝试 tasks.zookeeper dnsrr 地址,但我的经验是没有正确解析到服务背后的容器列表。

仅供参考,您不会从运行两个 Zookeeper 节点中获得任何好处;Zookeeper 需要仲裁中超过一半的节点才能启动,因此您至少需要三个节点才能具有任何容错能力。


推荐阅读