apache-spark - 为什么 spark.jdbc 中需要 dbtable / query
问题描述
我是 SPARK 菜鸟,我不清楚为什么需要dbtable
ORquery
作为 JDBC 选项的一部分。
例如,将其与 Presto JDBC 驱动程序一起使用,Presto 驱动程序不喜欢 url、驱动程序、dbtable 和查询参数。其他驱动程序执行类似的验证(例如 Presto 的 CData 驱动程序)
url = "jdbc:presto:Server=spill.asifkazi.cp.ahana.cloud;Port=443;"
jdbcDriver = "com.facebook.presto.jdbc.PrestoDriver"
sqlQuery = "select * from customer limit 1"
jdbcOptions = spark.read.format("jdbc")
jdbcOptions.option("url",jdbcUrl)
jdbcOptions.option("user", user)
jdbcOptions.option("password", password)
jdbcOptions.option("query",sqlQuery)
df = jdbcOptions.load()
df.show()
21/05/13 21:50:41 INFO SharedState: Warehouse path is 'file:/Users/asifkazi/Downloads/Projects/pyspark/spark-warehouse'.
Traceback (most recent call last):
File "/Users/asifkazi/Downloads/Projects/pyspark/test_jdbc.py", line 32, in <module>
df = jdbcOptions.load()
File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 210, in load
File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o35.load.
: java.sql.SQLException: Unrecognized connection property 'driver'
at com.facebook.presto.jdbc.PrestoDriverUri.validateConnectionProperties(PrestoDriverUri.java:353)
at com.facebook.presto.jdbc.PrestoDriverUri.<init>(PrestoDriverUri.java:104)
at com.facebook.presto.jdbc.PrestoDriverUri.<init>(PrestoDriverUri.java:94)
at com.facebook.presto.jdbc.PrestoDriver.connect(PrestoDriver.java:87)
at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
at
为什么我不能像在 JDBC 中那样简单地为 spark 创建 JDBC 连接,然后独立运行查询?有没有办法在不将信息作为 jdbc 选项的一部分传递的情况下完成查询?
解决方案
You should be wrapping the query and passing to the database, Hoping that url is fine the below should work, I dont have presto test it out
query = """(select * from customer limit 1) query_wrap"""
url = 'jdbc:presto:Server=spill.asifkazi.cp.ahana.cloud;Port=443'
connectionProperties = {"user": "user" , "password": "paswwrod", "driver": "com.facebook.presto.jdbc.PrestoDriver", "fetchsize": "10000"}
df = spark.read.jdbc(url = url, table = query, properties = connectionProperties)
推荐阅读
- ruby-on-rails - Rspec 控制器规范声明未调用函数 ios
- c# - 来自应用程序身份验证的共享 OneDrive 目录
- google-cloud-platform - 如何以原子方式手动修改 BQ 表架构?
- java - 在java中转换日期格式
- git - 从 cli 在 azure 中锁定 git 分支
- c++ - 为什么虚拟循环会消耗内核时间?
- android - 如何在 Intellij IDEA 中创建布局捕获
- android - 运行`detox build -c android.emu.debug`任务时出错:app:compileDebugJavaWithJavac FAILED in react native
- javascript - encodeURI(JSON.stringify()) 在 URL 中显示 %255B
- macos - 无法在本地主机上创建 Solr 核心 - 错误:连接重置