python-3.x - 带有 conda 集成的 pySpark 抛出错误 pyspark 无法识别
问题描述
遵循的步骤:安装 Java、Python、Spark、Anaconda 并在每个中设置路径。但pyspark
在命令提示符下没有将 Jupyter 链接到笔记本。
收到以下错误:
“'pyspark' 不是内部或外部命令、可运行程序或批处理文件。”
解决方案
Follow these steps:
Install JAVA
1.Download Python
Python 3.x
[https://www.python.org/downloads/][1]
2.Set Path
As we have select the "set path" option we don’t have o set the path manually.
3.Verify Python Install or not
a)
Cmd>python -V
b)
Open Python terminal by writing "python" command in the terminal-IDLEs
InStall spark
Verify PySpark Installed or not:-
===================================================
Cmd>pyspark
It will open pyspark shell i.e python shell i.e IDLEs
IDLEs is an interactive shell to write python applications
First Pyspark Application:-
===================================================
We can write PySpark Application in 2 modes. They are:
1.Interactive --Pysaprk Shell
2.Batch Application---IDEs --Integrated Development Environments
(Jupyter Notebook,Pycharms,etc)
How to develop first pyspark appliction in interactive mode??
===================================================
e.g Load local file and count no.of rows and print data
Cmd>pyspark
--> it will open pyspark ahell
-->It is created sparkContext with variable name "sc"
-->SparkContext is a predefined class,it is required to write Spark Application
>>>sc
<SparkContext master=local[*] appName=PySparkShell>
ANACONDA Installation:
============================================
Jupyter Notebook installation
1.Download Anaconda
https://www.anaconda.com/distribution/
2.Install Anaconda
By double click .exe file choose all default options
3.set Path Variable (This is optional when se;ect add path environment at the time of
installation)
4.Start Anaconda and Open Jupyter
Configuring PySpark with Jupyter Notebook:-
============================================
1.Python or Anaconda software must be installed(Jupiter Notebook)
2.PySpark must be installed.
How to open Pyspark:
==================
Cmd>pyspark
How PySpark to start Jupyter Notebook:
==========================
We can start Jupyter notebook in two ways. They are:
1.Start Anaconda Navigater--->Launch Jupyter Notebook
2.Open command prompt and type
Cmd>jupyter notebook
Here we write Python Application
Set Environmental Variable:-
=========================
PYSPARK_DRIVER_PYTHON=jupyter
PYSPAR_DRIVER_PYTHON_OPTS=notebook
[1]: https://www.python.org/downloads/
推荐阅读
- sql - postgresql 的 ON CONFLICT DO UPDATE 是否会产生大量死元组?
- c - LuaJIT 能比 C 快吗?
- bash - Bash:-d 的使用给出了错误的结果
- javascript - 用于响铃的 Vscode 键盘快捷键
- javascript - Onclick 事件将元素向下移动特定数量的像素
- go - 如何在golang的txt文件中编写循环的输出
- java - “使用情况统计”权限引发错误?
- javascript - 为什么不是从“LocalStorage”中的数据数组创建表?
- mysql - MySQL JOIN 行存在但与 Where 语句不匹配
- c - C 编程将十六进制的字符串表示形式转换为十六进制