首页 > 解决方案 > 了解 SparkSQL createOrReplaceTempView 性能

问题描述

将 Spark 作为 SQL 代码调用的常用方法之一是使用createOrReplaceTempViewie

df.createOrReplaceTempView('table_view_name')

df.createOrReplaceTempView当我们将相同的 df 传递给多个函数进行一些转换时,多次调用如何影响性能。IE

# file_a.py
Class_a:
    def function_a(df):
        df.createOrReplaceTempView('table_view_name')
        ....
# file_b.py
Class_b:
    def function_b1(df):
        df.createOrReplaceTempView('table_view_name')
        ....

    def function_b2(df):
        df.createOrReplaceTempView('table_view_name')
        ....

# file_c.py
Class_c:
    def function_c1(df):
        df.createOrReplaceTempView('table_view_name')
        ....

#main.py
from file_a import Class_A
from file_b import Class_B
from file_c import Class_C

class_a = Class_A()
class_b = Class_B()
class_c = Class_C()

sample_df  = Spark.read.parquet("...")
sample_df  = class_a.function_a(sample_df)

# Note class_b has two transformations second one use the output of the first one
sample_df_b  = class_b.function_b1(sample_df)
sample_df  = class_b.function_b2(sample_df_b)

# Note class_c has one transformations but it take the output of class_a as input 

sample_df_c  = class_c.function_c1(sample_df)

现在我想知道

标签: pythonapache-sparkpysparkapache-spark-sqldatabricks

解决方案


推荐阅读