首页 > 解决方案 > 从 Azure Eventhub => StreamingQueryException 读取 Spark:输入字节数组有错误的 4 字节结束单元

问题描述

我正在尝试使用 Spark/Python 收集 Azure Eventhub 消息。每次,我都会收到异常“StreamingQueryException:输入字节数组有错误的 4 字节结束单元”

请问有什么想法吗?

conf = {}
conf["eventhubs.connectionString"] = "Endpoint=sb://XXXXXXXXX.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=XXXXXXXXXXXXX=;EntityPath=XXXXXX"
                                      
read_df  = spark.readStream.format("eventhubs").options(**conf).load()
stream = read_df.writeStream.format("console").start()
stream.awaitTermination()

标签: apache-sparkpysparkazure-eventhub

解决方案


请注意,对于 2.3.15 及以上版本,您需要对配置字典中的连接字符串进行加密:

ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)

https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/PySpark/structured-streaming-pyspark.md#event-hubs-configuration


推荐阅读