首页 > 解决方案 > 为什么 SparkSession.sql() 不适用于 SELECT 查询?


我很好奇为什么 SparkSession 说 sql() 不适用于 SELECT 命令。



   * Executes a SQL query using Spark, returning the result as a `DataFrame`.
   * This API eagerly runs DDL/DML commands, but not for SELECT queries.
   * @since 2.0.0
  def sql(sqlText: String): DataFrame = withActive {
    val tracker = new QueryPlanningTracker
    val plan = tracker.measurePhase(QueryPlanningTracker.PARSING) {
    Dataset.ofRows(self, plan, tracker)

标签: apache-sparkapache-spark-sql


I think the docs mean that it eagerly runs DDL/DML commands but it does not eagerly run SELECT queries. That's the nature of Spark's lazy evaluation - it never runs SELECT queries eagerly because they are transformations; it will only include it in a query plan until you call an action.

However, DDL/DML commands are actions, so they will be run eagerly instead.

So, to answer your question, it's totally fine to use spark.sql to run SELECT queries. It will return a dataframe for the results of the query.
