sql-server - 调用 o898.save 时出错。Azure Synapse Analytics 连接器代码中遇到异常
问题描述
def synapsedump(targetmount,targetfolder,table,df):
dbutils.fs.rm("/mnt/tmp", recurse=True)
df.createOrReplaceTempView(table)
spark.conf.set("spark.databricks.sqldw.writeSemantics", "copy")
schema = "Amazon"
schematable = schema + "." + table
df = spark.sql("select * from " + table)
print(df.count())
df.write \
.format("com.databricks.spark.sqldw") \
.option("url", sqlconnectionstring ) \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("dbTable", schematable) \
.option("tableOptions", "distribution=hash(HashSha2)") \
.option("maxStrLength", "4000") \
.option("tempDir", sqltempdir) \
.mode("append") \
.save()
df.write.mode("append").parquet(targetmount + targetfolder + table)
以上是我用来写入同一笔记本中突触中的多个表的写入函数。
该代码以前可以正常工作,但是由于某种原因它开始抛出以下错误:
Py4JJavaError: An error occurred while calling o898.save.
: com.databricks.spark.sqldw.SqlDWConnectorException: Exception encountered in Azure Synapse Analytics connector code.
at com.databricks.spark.sqldw.Utils$.wrapExceptions(Utils.scala:444)
at com.databricks.spark.sqldw.DefaultSource.createRelation(DefaultSource.scala:86)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:91)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:200)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$3(SparkPlan.scala:252)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:248)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:192)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:158)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:157)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:999)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:116)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:249)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:101)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:845)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:199)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:999)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:437)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:421)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Unexpected version returned: Microsoft SQL Azure (RTM) - 12.0.2000.8
Jun 24 2021 23:23:51
Copyright (C) 2019 Microsoft Corporation
Make sure your JDBC url includes a "database=<DataWareHouseName>" option and that
it points to a valid Azure Synapse SQL Analytics (Azure SQL Data Warehouse) name.
This connector cannot be used for interacting with any other systems (e.g. Azure
SQL Databases).
at com.databricks.spark.sqldw.DefaultSource.$anonfun$validateJdbcConnection$2(DefaultSource.scala:146)
at com.databricks.spark.sqldw.DefaultSource.$anonfun$validateJdbcConnection$2$adapted(DefaultSource.scala:140)
at com.databricks.spark.sqldw.JDBCWrapper.$anonfun$executeQueryInterruptibly$1(SqlDWJDBCWrapper.scala:105)
at com.databricks.spark.sqldw.JDBCWrapper.withPreparedStatement(SqlDWJDBCWrapper.scala:307)
at com.databricks.spark.sqldw.JDBCWrapper.executeQueryInterruptibly(SqlDWJDBCWrapper.scala:102)
at com.databricks.spark.sqldw.DefaultSource.$anonfun$validateJdbcConnection$1(DefaultSource.scala:140)
at com.databricks.spark.sqldw.DefaultSource.$anonfun$validateJdbcConnection$1$adapted(DefaultSource.scala:138)
at com.databricks.spark.sqldw.JDBCWrapper.withConnection(SqlDWJDBCWrapper.scala:285)
at com.databricks.spark.sqldw.DefaultSource.validateJdbcConnection(DefaultSource.scala:138)
at com.databricks.spark.sqldw.DefaultSource.$anonfun$createRelation$3(DefaultSource.scala:88)
at com.databricks.spark.sqldw.Utils$.wrapExceptions(Utils.scala:410)
... 33 more
从早期到现在的唯一变化是运行此笔记本的资源组与早期不同,但它仍在访问旧资源组中 ADLS 的数据。我已经尝试在新资源 grp 笔记本中为旧资源组源运行 dbutils.fs.ls,并且我能够获取 ADLS 中的所有文件。所以连接应该不是问题。我还将 databricks 运行时从 8 更改回原来的 7.3。
下面是 conn 字符串格式:
sqlconnectionstring = "jdbc:sqlserver://"+sqlserver+":1433;database="+sqldatabase+";user="+sqluser+";password="+sqlpassword
请帮助我,因为我似乎根本找不到这个错误的根源。我只使用专用池。它不是按需提供的。(我检查了)
解决方案
我发现我正在使用数据仓库格式写入 sql 服务器。df.write.format('jdbc')
在它工作的情况下将格式更改为 jdbc 。但是这种格式在写入 sql server 时非常慢。如果有更快的选择,请向我指出
推荐阅读
- c++ - 了解类共享指针及其在继承中的使用
- typescript - 如何扩展用于类型检查的递归打字稿接口?
- java - javabridge.jutil.JVMNotFoundError:找不到Java虚拟机
- json - Python AWS lambda JSON序列化问题
- clickhouse - 以毫秒为单位的 clickhouse dateTime
- r - R中的历史股票价格下载
- jquery - 我如何使用 jquery 计算 asp.net mvc 上的文本框中留下的字符
- jquery - jquery 输入类型文件 *.xlsx
- php - php核心贝宝sdk
- java - Maven 资源插件不会过滤掉嵌套在 ${} 中的自定义分隔符