apache-spark - 增量表作为流源。如何计算消费滞后

问题描述

我正在使用 apace spark 结构化流从增量表中流式传输数据。问题是我对消费者滞后一无所知。我们如何计算消费者滞后？

我也扔掉了每批创建的检查点文件，但它只有一个时代的时间戳。

https://docs.microsoft.com/en-us/azure/databricks/delta/delta-streaming#:~:text=for%20a%20stream-,Delta%20table%20as%20a%20stream%20source,and% 20tables%20as%20a%20stream。

  .format("delta")
  .load("/mnt/delta/events")
  .groupBy("customerId")
  .count()
  .writeStream
  .format("delta")
  .outputMode("complete")
  .option("checkpointLocation", "/mnt/delta/eventsByCustomer/_checkpoints/streaming-agg")
  .start("/mnt/delta/eventsByCustomer")

标签： apache-sparkapache-kafkaspark-structured-streaming

apache-spark - 增量表作为流源。如何计算消费滞后

问题描述

解决方案

推荐阅读