首页 > 解决方案 > How do I print out a spark.sql object?

问题描述

I have a spark.sql object that includes a couple of variables.

import com.github.nscala_time.time.Imports.LocalDate

val first_date = new LocalDate(2020, 4, 1)
val second_date = new LocalDate(2020, 4, 7)

val mydf = spark.sql(s"""
        select *
        from tempView
        where timestamp between '{0}' and '{1}'
""".format(start_date.toString, end_date.toString))

I want to print out mydf because I ran mydf.count and got 0 as the outcome.

I ran mydf and got back mydf: org.apache.spark.sql.DataFrame = [column: type]

I also tried println(mydf) and it didn't return the query.

There is this related question, but it does not have the answer.

How can I print out the query?

标签: apache-sparkpysparkapache-zeppelin

解决方案


最简单的方法是将您的查询存储到 avariable然后打印出变量以获取查询。

  • 用于variable_spark.sql

Example:

In Spark-scala:

val start_date="2020-01-01"
val end_date="2020-02-02"
val query=s"""select * from tempView where timestamp between'${start_date}' and '${end_date}'"""
print (query)
//select * from tempView where timestamp between'2020-01-01' and '2020-02-02'

spark.sql(query)

In Pyspark:

start_date="2020-01-01"
end_date="2020-02-02"
query="""select * from tempView where timestamp between'{0}' and '{1}'""".format(start_date,end_date)

print(query)
#select * from tempView where timestamp between'2020-01-01' and '2020-02-02'

#use same query in spark.sql
spark.sql(query)

推荐阅读