首页 > 解决方案 > 为什么 SparkSession.sql() 不适用于 SELECT 查询?

问题描述

我很好奇为什么 SparkSession 说 sql() 不适用于 SELECT 命令。

如果我坚持这样做有什么问题吗?

感谢您的回复!

/**
   * Executes a SQL query using Spark, returning the result as a `DataFrame`.
   * This API eagerly runs DDL/DML commands, but not for SELECT queries.
   *
   * @since 2.0.0
   */
  def sql(sqlText: String): DataFrame = withActive {
    val tracker = new QueryPlanningTracker
    val plan = tracker.measurePhase(QueryPlanningTracker.PARSING) {
      sessionState.sqlParser.parsePlan(sqlText)
    }
    Dataset.ofRows(self, plan, tracker)
  }

标签: apache-sparkapache-spark-sql

解决方案


I think the docs mean that it eagerly runs DDL/DML commands but it does not eagerly run SELECT queries. That's the nature of Spark's lazy evaluation - it never runs SELECT queries eagerly because they are transformations; it will only include it in a query plan until you call an action.

However, DDL/DML commands are actions, so they will be run eagerly instead.

So, to answer your question, it's totally fine to use spark.sql to run SELECT queries. It will return a dataframe for the results of the query.


推荐阅读