首页 > 解决方案 > Flink not forwarding Kafka metrics when parallelism greater than 1

问题描述

I have a Flink job which reads from Kafka (v0.9) and writes to Redis. I want to monitor the records-consumed-rate and records-lag-max metrics emitted by Kafka which Flink should be able to forward. In this case, I am forwarding to Datadog.

When I start the job with a parallelism of 1, I see this metric emitted just fine. However, if I make the parallelism greater than 1, this metric is no longer forwarded. The job is running when parallelism > 1 because I can see the entries being written to Redis.

I'm running Flink (v1.6.2) on AWS EMR:

The parallelism is set by streamExecutionEnvironment.setParallelism(). Each Kafka Consumer is instantiated with the same group.id and a unique client.id.

The DD agent is running just fine on the cluster. Many metrics are being emitted such as numberOfCompletedCheckpoints and upTime etc.

Is there any reason Flink would not be forwarding these metrics from Kafka if the parallelism is greater than 1?

Update: I also tried sending a custom DD metric (counter.inc()) from the Redis RichSinkFunction. When the parallelism=1, the metric is sent fine. When parallelism=7, the metric is not sent however it is being called (added a debug line). So it seems its not limited to the forwarded metrics from Kafka.

标签: apache-kafkaapache-flink

解决方案


问题是 HTTPRequest 的大小越大,并行度越高,这是有意义的。我回来了“请求实体太大”,但是异常没有正确注销,所以我错过了。

Flink DatadogHttpReporter在构建时似乎没有考虑请求的大小。我修改了 Reporter 以将每个请求的指标数量限制为 1000。现在指标显示得很好。


推荐阅读