首页 > 解决方案 > 从 pyspark shell 运行脚本会产生 table not found 错误

问题描述

我正在尝试使用 pyspark shell 运行一个简单的 pyspark 脚本,如下所示,

$pyspark < SPTest2.py

这导致未找到视图的异常表 -

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2021-10-11 14:58:07 WARN  Utils:66 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Traceback (most recent call last):
  File "xxxxxxxxx/spark-xxxxxx/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "xxxxxx/spark-xxxxxx/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o49.sql.
: org.apache.spark.sql.AnalysisException: Table or view not found: `xxxx`.`stg_party_phone`; line 1 pos 14;
'GlobalLimit 100
+- 'LocalLimit 100
   +- 'Project [*]
      +- 'UnresolvedRelation `crz_csapp`.`stg_party_phone`

但是,我可以在 pyspark shell 命令行中手动运行脚本的每一行,它会产生我期望的结果。以下是简单的脚本代码

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

spark.conf.set("spark.executor.instances", 64)
spark.conf.set("spark.executor.cores", 1)
spark.conf.set("spark.sql.shuffle.partitions",128)

from datetime import datetime
start_time = datetime.now()

sqlDF = spark.sql("SELECT * FROM xxxx.stg_party_phone limit 100")
sqlDF.count()
sqlDF.show(5)
print("read the data and created spark data frame ")
end_time = datetime.now()

print('Time to Create dataframe reading Hive Table')
print('Duration: {}'.format(end_time - start_time))

标签: apache-spark

解决方案


推荐阅读