首页 > 解决方案 > Kafka apache + 如何跨 jbod 重新平衡主题

问题描述

我们在 RHEL 7.6 linux 版本上安装了 3 台 kafka brokers 机器

卡夫卡版本是 2.7.x

每个 kafka 代理都有 8 个 Jbod 磁盘,如下所示(df -h 详细信息)

df -h

/dev/sdc                    1.7T  929G  748G  56% /kafka/kafka_logs2
/dev/sdd                    1.7T  950G  727G  57% /kafka/kafka_logs3
/dev/sde                    1.7T  999G  678G  60% /kafka/kafka_logs4
/dev/sdf                    1.7T  971G  706G  58% /kafka/kafka_logs5
/dev/sdg                    1.7T  1.1T  563G  67% /kafka/kafka_logs6
/dev/sdh                    1.7T  962G  714G  58% /kafka/kafka_logs7
/dev/sdi                    1.7T  1.1T  621G  63% /kafka/kafka_logs8

正如我们从该磁盘上方看到的那样/kafka/kafka_logs6-67%使用

/kafka/kafka_logs2什么时候56%

经过短暂的调查,我们发现主题/s的分区在磁盘上的数量不同

例如

让我们以主题为例-cars_costs.ml例如,该主题有100个分区

现在让我们看看 jbod 磁盘

我们11在磁盘上只有/kafka/kafka_logs2与主题相关的分区 -cars_costs.ml

但是在磁盘上- /kafka/kafka_logs6,我们有21与同一主题相关的分区-cars_costs.ml

所以 - 我们不明白为什么 Kafka 在 jbod 磁盘上定位不同的分区号

所以只是为了总结磁盘上的分区数

disk                   number of partition ( cars_costs.ml )
/kafka/kafka_logs2   - 11
/kafka/kafka_logs3   - 13
/kafka/kafka_logs4   - 20
/kafka/kafka_logs5   - 14
/kafka/kafka_logs6   - 21
/kafka/kafka_logs7   - 10
/kafka/kafka_logs8   - 11

已经设置的有用参数server.properties

offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
default.replication.factor=3
broker.rack=/default-rack

完整的参数是:

more server.properties
auto.create.topics.enable=false
auto.leader.rebalance.enable=true
background.threads=10
log.retention.bytes=-1
log.retention.hours=48
delete.topic.enable=true
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=10
log.dir=/kafka/kafka-logs2,/kafka/kafka-logs3 ...............
log.flush.interval.messages=9223372036854775807
log.flush.interval.ms=1000
log.flush.offset.checkpoint.interval.ms=60000
log.flush.scheduler.interval.ms=9223372036854775807
log.flush.start.offset.checkpoint.interval.ms=60000
compression.type=producer
log.roll.jitter.hours=0
log.segment.bytes=1073741824
log.segment.delete.delay.ms=60000
message.max.bytes=1000012
min.insync.replicas=1
num.io.threads=10
num.network.threads=48
num.recovery.threads.per.data.dir=1
num.replica.fetchers=1
offset.metadata.max.bytes=4096
offsets.commit.required.acks=-1
offsets.commit.timeout.ms=5000
offsets.load.buffer.size=5242880
offsets.retention.check.interval.ms=600000
offsets.retention.minutes=10080
offsets.topic.compression.codec=0
offsets.topic.num.partitions=50
offsets.topic.replication.factor=3
offsets.topic.segment.bytes=104857600
queued.max.requests=1000
quota.consumer.default=9223372036854775807
quota.producer.default=9223372036854775807
replica.fetch.min.bytes=1
replica.fetch.wait.max.ms=500
replica.high.watermark.checkpoint.interval.ms=5000
replica.lag.time.max.ms=10000
replica.socket.receive.buffer.bytes=65536
replica.socket.timeout.ms=30000
request.timeout.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
transaction.max.timeout.ms=900000
transaction.state.log.load.buffer.size=5242880
transaction.state.log.min.isr=2
transaction.state.log.num.partitions=50
transaction.state.log.replication.factor=3
transaction.state.log.segment.bytes=104857600
transactional.id.expiration.ms=604800000
unclean.leader.election.enable=false
zookeeper.connection.timeout.ms=600000
zookeeper.max.in.flight.requests=10
zookeeper.session.timeout.ms=600000
zookeeper.set.acl=false
broker.id.generation.enable=true
connections.max.idle.ms=600000
connections.max.reauth.ms=0
controlled.shutdown.enable=true
controlled.shutdown.max.retries=3
controlled.shutdown.retry.backoff.ms=5000
controller.socket.timeout.ms=30000
default.replication.factor=3
delegation.token.expiry.time.ms=86400000
delegation.token.max.lifetime.ms=604800000
delete.records.purgatory.purge.interval.requests=1
fetch.purgatory.purge.interval.requests=1000
group.initial.rebalance.delay.ms=3000
group.max.session.timeout.ms=1800000
group.max.size=2147483647
group.min.session.timeout.ms=6000
log.cleaner.backoff.ms=15000
log.cleaner.dedupe.buffer.size=134217728
log.cleaner.delete.retention.ms=86400000
log.cleaner.enable=true
log.cleaner.io.buffer.load.factor=0.9
log.cleaner.io.buffer.size=524288
log.cleaner.io.max.bytes.per.second=1.7976931348623157e308
log.cleaner.max.compaction.lag.ms=9223372036854775807
log.cleaner.min.cleanable.ratio=0.5
log.cleaner.min.compaction.lag.ms=0
log.cleaner.threads=1
log.cleanup.policy=delete
log.index.interval.bytes=4096
log.index.size.max.bytes=10485760
log.message.timestamp.difference.max.ms=9223372036854775807
log.message.timestamp.type=CreateTime
log.preallocate=false
log.retention.check.interval.ms=300000
max.connections=2147483647
max.connections.per.ip=2147483647
max.incremental.fetch.session.cache.slots=1000
num.partitions=1
producer.purgatory.purge.interval.requests=1000
queued.max.request.bytes=-1
replica.fetch.backoff.ms=1000
replica.fetch.max.bytes=1048576
replica.fetch.response.max.bytes=10485760
reserved.broker.max.id=1500
transaction.abort.timed.out.transaction.cleanup.interval.ms=60000
transaction.remove.expired.transaction.cleanup.interval.ms=3600000
zookeeper.sync.time.ms=2000
broker.rack=/default-rack

标签: apache-kafka

解决方案


我查了一下,似乎这是 Kafka 在 jbod 磁盘上的已知行为。

https://mail-archives.apache.org/mod_mbox/kafka-users/201506.mbox/%3CCAA+BczTLvZND4MGsG-LBM-wutzTNy3CXKLRRjo_55Xp00fwXLw@mail.gmail.com%3E

甚至为此提供了三个 KIP。

简而言之:是的,对磁盘的分区分配是不平衡的,但是您可以以管理员身份重新分配,例如使用kafka-reassign-partitions.sh脚本。如果您的分区负载不平衡并且需要在磁盘分配中反映出来,这也非常有用。

当然,如果您有融合平台,他们会为您处理。https://docs.confluent.io/platform/current/kafka/rebalancer/index.html

你生活和学习...


推荐阅读