首页 > 解决方案 > 为什么 spark.jdbc 中需要 dbtable / query

问题描述

我是 SPARK 菜鸟,我不清楚为什么需要dbtableORquery作为 JDBC 选项的一部分。

例如,将其与 Presto JDBC 驱动程序一起使用,Presto 驱动程序不喜欢 url、驱动程序、dbtable 和查询参数。其他驱动程序执行类似的验证(例如 Presto 的 CData 驱动程序)

url = "jdbc:presto:Server=spill.asifkazi.cp.ahana.cloud;Port=443;"
jdbcDriver = "com.facebook.presto.jdbc.PrestoDriver" 
sqlQuery = "select * from customer limit 1"
jdbcOptions = spark.read.format("jdbc")
jdbcOptions.option("url",jdbcUrl)
jdbcOptions.option("user", user)
jdbcOptions.option("password", password)
jdbcOptions.option("query",sqlQuery)
df = jdbcOptions.load()
df.show()
21/05/13 21:50:41 INFO SharedState: Warehouse path is 'file:/Users/asifkazi/Downloads/Projects/pyspark/spark-warehouse'.
Traceback (most recent call last):
  File "/Users/asifkazi/Downloads/Projects/pyspark/test_jdbc.py", line 32, in <module>
    df = jdbcOptions.load()
  File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 210, in load
  File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
  File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
  File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o35.load.
: java.sql.SQLException: Unrecognized connection property 'driver'
    at com.facebook.presto.jdbc.PrestoDriverUri.validateConnectionProperties(PrestoDriverUri.java:353)
    at com.facebook.presto.jdbc.PrestoDriverUri.<init>(PrestoDriverUri.java:104)
    at com.facebook.presto.jdbc.PrestoDriverUri.<init>(PrestoDriverUri.java:94)
    at com.facebook.presto.jdbc.PrestoDriver.connect(PrestoDriver.java:87)
    at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
    at 

为什么我不能像在 JDBC 中那样简单地为 spark 创建 JDBC 连接,然后独立运行查询?有没有办法在不将信息作为 jdbc 选项的一部分传递的情况下完成查询?

标签: apache-sparkjdbcpyspark

解决方案


You should be wrapping the query and passing to the database, Hoping that url is fine the below should work, I dont have presto test it out

query = """(select * from customer limit 1) query_wrap"""
url = 'jdbc:presto:Server=spill.asifkazi.cp.ahana.cloud;Port=443'
connectionProperties = {"user": "user" , "password": "paswwrod", "driver": "com.facebook.presto.jdbc.PrestoDriver", "fetchsize": "10000"}
df = spark.read.jdbc(url = url, table = query, properties = connectionProperties)

推荐阅读