apache-spark - 为什么 SparkSession.sql() 不适用于 SELECT 查询?
问题描述
我很好奇为什么 SparkSession 说 sql() 不适用于 SELECT 命令。
如果我坚持这样做有什么问题吗?
感谢您的回复!
/**
* Executes a SQL query using Spark, returning the result as a `DataFrame`.
* This API eagerly runs DDL/DML commands, but not for SELECT queries.
*
* @since 2.0.0
*/
def sql(sqlText: String): DataFrame = withActive {
val tracker = new QueryPlanningTracker
val plan = tracker.measurePhase(QueryPlanningTracker.PARSING) {
sessionState.sqlParser.parsePlan(sqlText)
}
Dataset.ofRows(self, plan, tracker)
}
解决方案
I think the docs mean that it eagerly runs DDL/DML commands but it does not eagerly run SELECT queries. That's the nature of Spark's lazy evaluation - it never runs SELECT queries eagerly because they are transformations; it will only include it in a query plan until you call an action.
However, DDL/DML commands are actions, so they will be run eagerly instead.
So, to answer your question, it's totally fine to use spark.sql
to run SELECT queries. It will return a dataframe for the results of the query.
推荐阅读
- html - 自动化 IE - VBA 问题,无法为 HTML 中的对象赋值
- tensorflow - 为什么我的定制神经网络不起作用,并且 MAE 高
- c# - 具有 Mono 和 C# 7.0 功能的 Jetbrains Rider
- spring-boot - 创建名称为“frontcontroller”的 bean 时出错:通过字段“databaseaction”表示的依赖关系不满足
- node.js - 从另一个js文件导入firebase模块的正确方法是什么?
- php - 如何在 PHP 中解决 DEGREES?
- sql-server-2017 - 根据不同服务器上表的列更新表的列
- wiremock - 如何使用wiremock json模板返回动态内容?
- angular - Angular 验证器指令 - 验证方法永远不会被调用
- r - 用ggplot2绘制由三个点定义的圆段