首页 > 解决方案 > Pyspark:SPARK_HOME 可能配置不正确

问题描述

我正在尝试pyspark在 conda 环境中使用笔记本运行。

$ which python

在环境'env'中,返回:

/Users/<username>/anaconda2/envs/env/bin/p

ython

和环境之外:

/Users/<username>/anaconda2/bin/python

我的.bashrc文件有:

export PATH="/Users/<username>/anaconda2/bin:$PATH"

export JAVA_HOME=`/usr/libexec/java_home`
export SPARK_HOME=/usr/local/Cellar/apache-spark/3.1.2   
export PYTHONPATH=$SPARK_HOME/libexec/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH 

但是,当我运行时:

import findspark

findspark.init()

我收到错误:

Exception: Unable to find py4j, your SPARK_HOME may not be configured correctly

有任何想法吗?

完整回溯

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/anaconda2/envs/ai/lib/python3.7/site-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
    142     try:
--> 143         py4j = glob(os.path.join(spark_python, "lib", "py4j-*.zip"))[0]
    144     except IndexError:

IndexError: list index out of range

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
/var/folders/dx/dfb8h2h925l7vmm7y971clpw0000gn/T/ipykernel_72686/1796740182.py in <module>
      1 import findspark
      2 
----> 3 findspark.init()

~/anaconda2/envs/ai/lib/python3.7/site-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
    144     except IndexError:
    145         raise Exception(
--> 146             "Unable to find py4j, your SPARK_HOME may not be configured correctly"
    147         )
    148     sys.path[:0] = [spark_python, py4j]

Exception: Unable to find py4j, your SPARK_HOME may not be configured correctly

编辑:

如果我在笔记本中运行以下内容:

import pyspark

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

我得到错误:

/usr/local/Cellar/apache-spark/3.1.2/bin/load-spark-env.sh: line 2: /usr/local/Cellar/apache-spark/3.1.2/libexec/bin/load-spark-env.sh: Permission denied
/usr/local/Cellar/apache-spark/3.1.2/bin/load-spark-env.sh: line 2: exec: /usr/local/Cellar/apache-spark/3.1.2/libexec/bin/load-spark-env.sh: cannot execute: Undefined error: 0

标签: pyspark

解决方案


推荐阅读