首页 > 解决方案 > 调用 o898.save 时出错。Azure Synapse Analytics 连接器代码中遇到异常

问题描述

def synapsedump(targetmount,targetfolder,table,df):
  dbutils.fs.rm("/mnt/tmp", recurse=True)
  df.createOrReplaceTempView(table)
  spark.conf.set("spark.databricks.sqldw.writeSemantics", "copy")
  schema = "Amazon"
  schematable = schema + "." + table
  df = spark.sql("select * from " + table) 
  print(df.count())
  
  df.write \
  .format("com.databricks.spark.sqldw") \
  .option("url", sqlconnectionstring ) \
  .option("forwardSparkAzureStorageCredentials", "true") \
  .option("dbTable", schematable) \
  .option("tableOptions", "distribution=hash(HashSha2)") \
  .option("maxStrLength", "4000") \
  .option("tempDir", sqltempdir) \
  .mode("append") \
  .save()
  df.write.mode("append").parquet(targetmount + targetfolder + table)

以上是我用来写入同一笔记本中突触中的多个表的写入函数。

该代码以前可以正常工作,但是由于某种原因它开始抛出以下错误:


Py4JJavaError: An error occurred while calling o898.save.
: com.databricks.spark.sqldw.SqlDWConnectorException: Exception encountered in Azure Synapse Analytics connector code.
    at com.databricks.spark.sqldw.Utils$.wrapExceptions(Utils.scala:444)
    at com.databricks.spark.sqldw.DefaultSource.createRelation(DefaultSource.scala:86)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:91)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:200)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$3(SparkPlan.scala:252)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:248)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:192)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:158)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:157)
    at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:999)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:116)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:249)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:101)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:845)
    at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:199)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:999)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:437)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:421)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Unexpected version returned: Microsoft SQL Azure (RTM) - 12.0.2000.8 
    Jun 24 2021 23:23:51 
    Copyright (C) 2019 Microsoft Corporation

Make sure your JDBC url includes a "database=<DataWareHouseName>" option and that
it points to a valid Azure Synapse SQL Analytics (Azure SQL Data Warehouse) name.
This connector cannot be used for interacting with any other systems (e.g. Azure
SQL Databases).
               
    at com.databricks.spark.sqldw.DefaultSource.$anonfun$validateJdbcConnection$2(DefaultSource.scala:146)
    at com.databricks.spark.sqldw.DefaultSource.$anonfun$validateJdbcConnection$2$adapted(DefaultSource.scala:140)
    at com.databricks.spark.sqldw.JDBCWrapper.$anonfun$executeQueryInterruptibly$1(SqlDWJDBCWrapper.scala:105)
    at com.databricks.spark.sqldw.JDBCWrapper.withPreparedStatement(SqlDWJDBCWrapper.scala:307)
    at com.databricks.spark.sqldw.JDBCWrapper.executeQueryInterruptibly(SqlDWJDBCWrapper.scala:102)
    at com.databricks.spark.sqldw.DefaultSource.$anonfun$validateJdbcConnection$1(DefaultSource.scala:140)
    at com.databricks.spark.sqldw.DefaultSource.$anonfun$validateJdbcConnection$1$adapted(DefaultSource.scala:138)
    at com.databricks.spark.sqldw.JDBCWrapper.withConnection(SqlDWJDBCWrapper.scala:285)
    at com.databricks.spark.sqldw.DefaultSource.validateJdbcConnection(DefaultSource.scala:138)
    at com.databricks.spark.sqldw.DefaultSource.$anonfun$createRelation$3(DefaultSource.scala:88)
    at com.databricks.spark.sqldw.Utils$.wrapExceptions(Utils.scala:410)
    ... 33 more

从早期到现在的唯一变化是运行此笔记本的资源组与早期不同,但它仍在访问旧资源组中 ADLS 的数据。我已经尝试在新资源 grp 笔记本中为旧资源组源运行 dbutils.fs.ls,并且我能够获取 ADLS 中的所有文件。所以连接应该不是问题。我还将 databricks 运行时从 8 更改回原来的 7.3。

下面是 conn 字符串格式:

sqlconnectionstring = "jdbc:sqlserver://"+sqlserver+":1433;database="+sqldatabase+";user="+sqluser+";password="+sqlpassword

请帮助我,因为我似乎根本找不到这个错误的根源。我只使用专用池。它不是按需提供的。(我检查了)

标签: sql-serverazureazure-databricks

解决方案


我发现我正在使用数据仓库格式写入 sql 服务器。df.write.format('jdbc')在它工作的情况下将格式更改为 jdbc 。但是这种格式在写入 sql server 时非常慢。如果有更快的选择,请向我指出


推荐阅读