首页 > 解决方案 > FileNotFoundError: [WinError 2] 我在 cmd/Pycharm 中运行 pyspark 时系统找不到指定的文件

问题描述

我正在尝试使用以下代码在 pycharm 中运行 python 文件。当我在 cmd 和 pycharm 中提供 pyspark 时,我遇到了同样的问题,有人可以帮我解决这个问题。提前致谢。

代码 :

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, DateType, DecimalType, IntegerType
spark = SparkSession.builder.master("local[*]").appName("ETL").getOrCreate()
spark.sparkContext.setLogLevel("WARN")
source_data_file = "C:\Python_pgms\apache-spark-etl-pipeline-example-master\apache-spark-etl-pipeline-example-master\data\20160104\*"
print("Fetching")

面临以下问题:

Traceback (most recent call last):
  File "C:/Python_pgms/apache-spark-etl-pipeline-example-master/apache-spark-etl-pipeline-example-master/src/etl.py", line 5, in <module>
    spark = SparkSession.builder.master("local[*]").appName("ETL").getOrCreate()
  File "C:\Spark\spark-3.0.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\sql\session.py", line 186, in getOrCreate
  File "C:\Spark\spark-3.0.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 376, in getOrCreate
  File "C:\Spark\spark-3.0.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 133, in __init__
  File "C:\Spark\spark-3.0.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 325, in _ensure_initialized
  File "C:\Spark\spark-3.0.1-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\java_gateway.py", line 98, in launch_gateway
  File "C:\Users\comp\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Users\comp\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

Process finished with exit code 1

标签: pythonapache-sparkpyspark

解决方案


pycharm->运行->编辑配置->环境变量

根据您的路径添加 PYTHONPATH 和 SPARK_HOME


推荐阅读