python - 之后无法在本地运行 PySpark 代码
问题描述
在完成所有必需的 spark 配置设置后,我无法运行从数据块导入的 PySpark 代码,即使我遇到如下所示的错误。请帮我解决。
代码:
# Databricks notebook source
from pyspark import SparkConf, SparkContext
# COMMAND ----------
conf = SparkConf().setAppName('Read File')
# COMMAND ----------
sc = SparkContext.getOrCreate(conf=conf)
# COMMAND ----------
text = sc.textFile('sample.txt')
# COMMAND ----------
print(text.collect())
# COMMAND ----------
sc.stop()
# COMMAND ----------
错误:
C:\Users\Abhishikth\Desktop>spark-submit 'fsc2.py'
21/10/18 08:03:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'File file:/C:/Users/Abhishikth/Desktop/'fsc2.py' does not exist'. Please specify one with --class.
at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:968)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:486)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
解决方案
推荐阅读
- java - 如何在 sping-webflux 中制作 Mono/Flux 链并实现 Retry
- python-turtle - 如何创建具有可更改颜色的叠加层的螺旋线
- arrays - 反应原生中具有多级嵌套数组的部分列表
- python - Pyspark - 读取 csv 文件并保留原始特殊字符
- java - 打开跟踪未显示跨度之间的关系
- azure-functions - 在 Azure Function 横向扩展期间似乎卸载程序集的问题
- ruby - Ruby Google Drive API -- 统一常量 OOB_URI
- bash - 在unix bash中按行和列提取信息
- python - 无法弄清楚如何将用户输入的整数计算为阶乘
- rust - 向下转换 Ref 时处理向下转换错误
> 进入参考 >