pyspark - PySpark 未启动 - Windows 10
问题描述
我正在尝试在 Windows 10 pro 机器上为 Python 设置 Spark。但是,执行以下步骤后:
- 使用 Python 3.7 安装 Anaconda
- 安装JDK 8
- 使用 hadoop 2.7 安装预构建的 Spark 2.4.6
- 下载了winutils.exe
- 设置所有环境变量 - 也是用户路径设置
- 创建了一个 C:\tmp\hive 文件夹
- 成功使用 winutils.exe chmod -R 777 C:\tmp\hive 命令
当我尝试通过命令提示符启动 pyspark 时,会输出以下文本,此后没有任何反应 - 也没有错误?
(base) C:\Spark\bin>pyspark
Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32 Type "help", "copyright", "credits" or "license" for more information.
20/08/03 07:49:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
最终 1 + 小时后打印此错误:
Traceback (most recent call last):
File "C:\Program Files\Python37\lib\socket.py", line 589, in readinto
return self._sock.recv_into(b)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
在处理上述异常的过程中,又出现了一个异常:
Traceback (most recent call last):
File "C:\Spark\python\pyspark\shell.py", line 41, in <module>
spark = SparkSession._create_shell_session()
File "C:\Spark\python\pyspark\sql\session.py", line 573, in _create_shell_session
return SparkSession.builder\
File "C:\Spark\python\pyspark\sql\session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\Spark\python\pyspark\context.py", line 367, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:\Spark\python\pyspark\context.py", line 136, in __init__
conf, jsc, profiler_cls)
File "C:\Spark\python\pyspark\context.py", line 198, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "C:\Spark\python\pyspark\context.py", line 306, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "C:\Spark\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1523, in __call__
File "C:\Spark\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 985, in send_command
File "C:\Spark\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1152, in send_command
File "C:\Program Files\Python37\lib\socket.py", line 589, in readinto
return self._sock.recv_into(b)
解决方案
推荐阅读
- azure-sql-database - 如何在 Azure SQL 托管实例中取消还原命令
- php - Yii2 从后端调用 /api Webservice
- wpf - 为什么网格在编译的应用程序中有间距?
- javascript - setTimeout 不允许我附加到 innerHTML
- javascript - 如何在Javascript中不区分大小写地设置对象中的值?
- sql - 如何使用 DATETIME 字段从昨天提取所有数据
- jquery - MVC PartialViewResult 未在 Dropdownlistfor changed 事件上触发(事件已触发,PartialViewResult 未触发)
- python - jython max() 'int' 对象不可迭代
- java - Spring Boot 2.1 缺少多个 org.hibernate.jpa.event 类
- javascript - REACTJS:obj.push() 和 obj.concat 不是函数