首页 > 解决方案 > Docker中的Kafka连接和HDFS

问题描述

我在 docker-compose 中使用 kafka 连接 HDFS 接收器和 Hadoop(用于 HDFS)。

Hadoop(名称节点和数据节点)似乎工作正常。

但我对 kafka 连接接收器有一个错误:

ERROR Recovery failed at state RECOVERY_PARTITION_PAUSED 
(io.confluent.connect.hdfs.TopicPartitionWriter:277) 
org.apache.kafka.connect.errors.DataException: 
Error creating writer for log file hdfs://namenode:8020/logs/MyTopic/0/log

有关信息:

还有我的 kafka-connect 文件:

    name=hdfs-sink
    connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
    tasks.max=1
    topics=MyTopic
    hdfs.url=hdfs://namenode:8020
    flush.size=3

编辑:

我为 kafka connect 添加了一个 env 变量以了解集群名称(env 变量:CLUSTER_NAME 以在 docker compose 文件中添加 kafka 连接服务)。

错误不一样(似乎解决了一个问题):

INFO Starting commit and rotation for topic partition scoring-topic-0 with start offsets {partition=0=0} and end offsets {partition=0=2} 
 (io.confluent.connect.hdfs.TopicPartitionWriter:368)
ERROR Exception on topic partition MyTopic-0: (io.confluent.connect.hdfs.TopicPartitionWriter:403)
org.apache.kafka.connect.errors.DataException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
File /topics/+tmp/MyTopic/partition=0/bc4cf075-ccfa-4338-9672-5462cc6c3404_tmp.avro 
could only be replicated to 0 nodes instead of minReplication (=1).  
There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

编辑2:

hadoop.env文件是:

    CORE_CONF_fs_defaultFS=hdfs://namenode:8020

    # Configure default BlockSize and Replication for local
    # data. Keep it small for experimentation.
    HDFS_CONF_dfs_blocksize=1m

    YARN_CONF_yarn_log___aggregation___enable=true
    YARN_CONF_yarn_resourcemanager_recovery_enabled=true
    YARN_CONF_yarn_resourcemanager_store_class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
    YARN_CONF_yarn_resourcemanager_fs_state___store_uri=/rmstate
    YARN_CONF_yarn_nodemanager_remote___app___log___dir=/app-logs

    YARN_CONF_yarn_log_server_url=http://historyserver:8188/applicationhistory/logs/
    YARN_CONF_yarn_timeline___service_enabled=true
    YARN_CONF_yarn_timeline___service_generic___application___history_enabled=true
    YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled=true

    YARN_CONF_yarn_resourcemanager_hostname=resourcemanager
    YARN_CONF_yarn_timeline___service_hostname=historyserver

标签: dockerhadoopapache-kafkahdfsapache-kafka-connect

解决方案


最后就像@cricket_007 注意到的那样,我需要配置hadoop.conf.dir.

该目录应包含hdfs-site.xml.

当每个服务都被 dockerized 时,我需要创建一个命名卷以便在kafka-connect服务和namenode服务之间共享配置文件。

为此,我添加了我的docker-compose.yml

volumes:
  hadoopconf:

然后对于namenode服务,我添加:

volumes:
  - hadoopconf:/etc/hadoop

对于 kafka 连接服务:

volumes:
    - hadoopconf:/usr/local/hadoop-conf

最后,我hadoop.conf.dir在我的 HDFS 接收器属性文件中设置为/usr/local/hadoop-conf.


推荐阅读