python - 无法在 Pycharm 上正确运行 PySpark
问题描述
我的 Windows 中安装了 PySpark 3.1.2 和 Python 3.8.3。所有路径也在环境变量、spark_home、hadoop_home 和路径中正确设置。当我尝试运行此代码时,我仍然面临以下错误。错误是系统找不到指定的文件。
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
data2 = [("James", "abs"),
("Michael", "Rose"),
]
schema = StructType([ \
StructField("firstname", StringType(), True), \
StructField("middlename", StringType(), True), \
])
df = spark.createDataFrame(data=data2, schema=schema)
df.printSchema()
df.show(truncate=False)
错误如下。
21/09/01 14:36:19 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.IOException: Cannot run program "python3": CreateProcess error=2, The system cannot
find the file specified
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:165)
.....
.....
1/09/01 14:36:19 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
File "C:/Users/abc123/PycharmProjects/pythonProject/test.py", line 18, in <module>
df.show(truncate=False)
File "C:\Spark\spark-3.1.2-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\sql\dataframe.py",
line 486, in show
File "C:\Spark\spark-3.1.2-bin-hadoop3.2\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py",
line 1304, in __call__
File "C:\Spark\spark-3.1.2-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\sql\utils.py", line
111, in deco
File "C:\Spark\spark-3.1.2-bin-hadoop3.2\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py",
line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o39.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0
failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (B****.a*.******.com
executor driver): java.io.IOException: Cannot run program "python3": CreateProcess error=2,
The system cannot find the file specified
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
....
直到 df.printschema() 它工作正常,但是当我尝试运行 df.show()、df.count() 之类的操作时,就会出现上述错误。所有路径都在我的环境变量中正确设置。Python 也运行正常。但仍然无法解决这个问题。请指导我解决上述问题。
解决方案
推荐阅读
- react-native - 如何从 TabBar react-native 隐藏已经存在的标题栏
- java - 获得平均值
- reactjs - 解析失败 SyntaxError:输入意外结束
- c++ - 多次创建相同的对象名称是否很好c ++
- kubernetes - kubernetes 找出 pod 中哪个容器崩溃
- android - 无论如何要获得recyclerview项目的拖放事件的持续时间吗?
- visual-studio-code - Visual Studio Code - 可以对“打开编辑器”面板进行排序吗?
- python - 带有字段增量的Odoo Cron?
- r - 如何估计多项式模型的 ROC 曲线
- node.js - proxyquire 不存根静态方法 - nodejs