首页 > 解决方案 > How to prioritize StreamRecord selection of One Stream over another based on availability?

问题描述

Given Two Streams A and B in Flink, I want to process Stream A until it is empty and start reading B until there are records arriving at A.

I am looking for a loose contract. I found InputSelectable interface seems to provide the notion of providing priority of reads.

Based on this answer, I see a round-robin implementation of Stream reads. However, I am unsure from the documentation on what happens if one of the streams becomes empty?, does the operator stop processing records altogether?

One naive way to implement this would be to use Timers to poll and detect inactivity of a Stream before switching to a lower priority stream but this might be too inefficient.

Qs:

  1. Is there a built-in stream operator to achieve the above use case?
  2. What is the behavior of InputSelectable if one input becomes empty?
  3. Is Timer-based InputSelectable implementation the way to go?

标签: apache-flinkflink-streaming

解决方案


答案:

  1. 不。
  2. 我不相信行为是指定/保证的。
  3. 应该是可行的,但你需要小心。

有可能让自己陷入困境InputSelectable。如果您完全饿死其中一个输入,您将阻止检查点屏障对齐完成,从而阻止检查点。也可以构建死锁的拓扑。

您可能需要考虑网络缓冲区超时与计时器的交互。我认为您可能希望将流的网络缓冲区超时设置为A小于您用于计时器的超时。


推荐阅读