apache-kafka-streams - leftjoin 两个 GlobalKTables
问题描述
我正在尝试将流加入 2 个不同的 GlobalTables,将它们视为查找,更具体地说,设备(用户代理)和地理编码(IP 地址)。
问题在于序列化,但我不明白为什么。它卡在 DEFAULT_VALUE_SERDE_CLASS_CONFIG 上,但我要写入的主题已正确序列化。
//
// Set up serialization / de-serialization
private static Serde<String> stringSerde = Serdes.String();
private static Serde<PodcastData> podcastSerde = StreamsSerdes.PodCastSerde();
private static Serde<GeoCodedData> geocodedSerde = StreamsSerdes.GeoIPSerde();
private static Serde<DeviceData> deviceSerde = StreamsSerdes.DeviceSerde();
private static Serde<JoinedPodcastGeoDeviceData> podcastGeoDeviceSerde = StreamsSerdes.PodcastGeoDeviceSerde();
private static Serde<JoinedPodCastDeviceData> podcastDeviceSerde = StreamsSerdes.PodcastDeviceDataSerde()
...
GlobalKTable<String, DeviceData> deviceIDTable = builder.globalTable(kafkaProperties.getProperty("deviceid-topic"));
GlobalKTable<String, GeoCodedData> geoIPTable = builder.globalTable(kafkaProperties.getProperty("geoip-topic"));
//
// Stream from source topic
KStream<String, PodcastData> podcastStream = builder.stream(
kafkaProperties.getProperty("source-topic"),
Consumed.with(stringSerde, podcastSerde));
//
podcastStream
// left join the podcast stream to the device table, looking up the device
.leftJoin(deviceIDTable,
// get a DeviceData object from the user agent
(podcastID, podcastData) -> podcastData.getUser_agent(),
// join podcast and device and return a JoinedPodCastDeviceData object
(podcastData, deviceData) -> {
JoinedPodCastDeviceData data =
JoinedPodCastDeviceData.builder().build();
data.setPodcastObject(podcastData);
data.setDeviceData(deviceData);
return data;
})
// left join the podcast stream to the geo table, looking up the geo data
.leftJoin(geoIPTable,
// get a Geo object from the ip address
(podcastID, podcastDeviceData) -> podcastDeviceData.getPodcastObject().getIp_address(),
// join podcast and geo
(podcastDeviceData, geoCodedData) -> {
JoinedPodcastGeoDeviceData data=
JoinedPodcastGeoDeviceData.builder().build();
data.setGeoData(geoCodedData);
data.setDeviceData(podcastDeviceData.getDeviceData());
data.setPodcastData(podcastDeviceData.getPodcastObject());
return data;
})
//
.to(kafkaProperties.getProperty("sink-topic"),
Produced.with(stringSerde, podcastGeoDeviceSerde));
...
...
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, stringSerde.getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, stringSerde.getClass().getName());
错误 ERROR java.lang.String cannot be cast to DeviceData
解决方案
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, stringSerde.getClass().getName());
由于上述值,应用程序将使用 String serde 作为默认值 serde,除非您在制作 KTable/KStream/GlobalKTable 时明确指定。
由于 deviceIDTable 的预期值类型为DeviceData
,请指定如下所示:
您需要在 GlobalKTable 中定义值 serde。
GlobalKTable<String, DeviceData> deviceIDTable = builder.globalTable(kafkaProperties.getProperty("deviceid-topic"), Materialized.<String, DeviceData, KeyValueStore<Bytes, byte[]>>as(DEVICE_STORE)
.withKeySerde(stringSerde)
.withValueSerde(deviceSerde));
推荐阅读
- amazon-web-services - 使用 AWS Fargate 解析同一 VPC 中的主机
- javascript - jQuery 内置函数名称在编辑器中被剥离
- javascript - 未捕获的 ReferenceError:未定义数据
- javascript - 如何将 Buefy 全局对象添加到 Vue 3 CLI
- node.js - JupyterLab plotly 扩展错误:找不到模块 jupyter\lab\staging\node_modules\ejs\postinstall.js
- msbi - MSBI ISPAC 文件在双击时不执行
- shell - 在 Shell 文件创建的 tmux 会话中激活虚拟环境
- python - 在 Matplotlib (Python) 中处理子图的比例
- python - 使用装饰器跨不同模块注册函数
- django - (Django Rest Framework) 如何添加反向 URL 字段?