首页 > 解决方案 > Spark - MongoDB(当我覆盖我的集合时出现问题)

问题描述

我创建了这段代码来覆盖 mongo DB 中的集合。但是当我覆盖我的集合时,我的索引被删除,有没有办法发送索引或在我覆盖集合时创建索引?

from pyspark.sql import SparkSession

# Connect to CosmosDB to write on the collection
userName = dbutils.secrets.get(scope="MONGO" , key="MONGO_USER")
primaryKey = dbutils.secrets.get(scope="MONGO" , key="MONGO_PASS")
host = dbutils.secrets.get(scope="MONGO" , key="MONGO_HOST")
port = dbutils.secrets.get(scope="MONGO" , key="MONGO_PORT")
database = "ccvcmdbmongosbox" 
collection = "COLL_CCVIMPACTOS"

# Structure the connection
connectionString = "mongodb://{0}:{1}@{2}:{3}/{4}.{5}?ssl=true&replicaSet=globaldb&retrywrites=false&maxIdleTimeMS=120000".format(userName, primaryKey, host, port, database, collection)


spark = SparkSession\
    .builder\
    .config('spark.mongodb.input.uri', connectionString)\
    .config('spark.mongodb.output.uri', connectionString)\
    .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.11:2.3.1')\
    .getOrCreate()

impactos_mongo.write.format("com.mongodb.spark.sql.DefaultSource")\
    .mode("overwrite")\
    .option("uri", connectionString)\
    .option("replaceDocument", False)\
    .option("maxBatchSize", 100)\
    .option("database", database)\
    .option("collection", collection)\
    .save(

标签: mongodbpysparkdatabricks

解决方案


推荐阅读