首页 > 解决方案 > 将 Azure 数据块连接到 Cosmos DB Mongo API 时出错

问题描述

我已经在 databricks 中安装了 Spark mongodb 连接器,并尝试执行如下示例代码:

from pyspark.sql import SparkSession
 

my_spark = SparkSession \
    .builder \
    .appName("myApp") \
    .getOrCreate()


df = my_spark.read.format("com.mongodb.spark.sql.DefaultSource") \
  .option("uri", CONNECTION_STRING) \
  .load()

其中 CONNECTION_STRING 采用以下格式:

mongodb://USERNAME:PASSWORD@testgp.documents.azure.com:10255/DATABASE_NAME.COLLECTION_NAME?ssl=true&replicaSet=globaldb

但面临以下错误:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 15) (10.25.238.198 executor 0): java.io.InvalidClassException: com.mongodb.spark.rdd.partitioner.MongoPartition; local class incompatible: stream classdesc serialVersionUID = -2855217470084313385, local class serialVersionUID = -3413909316915051241

有没有人遇到过这个错误和可能的解决方案?

标签: apache-sparkazure-cosmosdbazure-databricksazure-cosmosdb-mongoapidatabricks-connect

解决方案


推荐阅读