java - Can you "cast" a PCollection to a PCollection to avoid deserialization?
问题描述
My situation is that I'm writing records out to Kafka after a reshuffle and I'd like to avoid deserializing to my object type and then reserializing to the same byte array before writing out to Kafka. My pipeline is currently CPU bound and it seems like this would be a good way to reduce CPU usage.
I'm using the Flink runner (Flink 1.12). My data is coming from Kafka (not controlled by me) in big compressed chunks, and I'm breaking the chunks into individual records (fanout is 10ks or 100ks of output records per input record) and writing to another Kafka topic. To get reasonable throughput I have to reshuffle the output records to multiple workers, but this incurs some deserialization and serialization cost.
My Kafka serializer / deserializer actually delegate to the appropriate Beam coder, so the serialized content should be the same for both systems.
Actually, my type is PCollection<KV<K, V>>
, and I'd like to cast to PCollection<KV<K, byte[]>>
, but if the single-type is already possible then that may suffice for my needs.
解决方案
推荐阅读
- jquery - isotope : 过滤多个数据属性和seelect
- python - 如何从字典的元素信息中提取
- git - Sourcetree:将分支与新的子模块合并
- mysql - MySQL查询返回未定义-NodeJS
- stream - 在 Akka Streams 中使用 TTL 管理状态
- python - 错误:使用 heroku CLI 部署 Python 应用程序
- swift - 如何让我的 CustomView 返回 View 以及 SwiftUI 中的更多额外数据?
- c# - 将 Excel 单元格区域转换为 DataTable C#
- reactjs - 如何根据 axios.post reactjs 的响应显示 Material-ui Alert
- node.js - 当我使用firebase模拟器和next.js时如何使用firebase-admin获取firestore数据