python - py4j.protocol.Py4JJavaError: 调用 o63.save 时出错。: java.lang.NoClassDefFoundError: org/apache/spark/Logging
问题描述
我是 Spark 和 BigData 组件 - HBase 的新手,我正在尝试在 Pyspark 中编写 Python 代码并连接到 HBase 以从 HBase 读取数据。我正在使用以下版本:
- 火花版本:
spark-3.1.2-bin-hadoop2.7
- 蟒蛇版本:
3.8.5
- HBase 版本:
hbase-2.3.5
我在 ubuntu 20.04 的本地安装了独立的 Hbase 和 Spark
代码:
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
sqlc = SQLContext(sc)
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
df = sc.parallelize([("1","Abby","Smith","K","3456main","Orlando","FL","45235"),
("2","Amaya","Williams","L","123Orange","Newark","NJ","27656"),("3","Alchemy","Davis","P","Warners","Sanjose","CA","34789")])
.toDF(schema=['key','firstName','lastName','middleName','addressLine','city','state','zipCode'])
df.show()
catalog=''.join('''{
"table":{"namespace":"emp_data","name":"emp_info"},
"rowkey":"key",
"columns":{
"key":{"cf":"rowkey","col":"key","type":"string"},
"fName":{"cf":"person","col":"firstName","type":"string"},
"lName":{"cf":"person","col":"lastName","type":"string"},
"mName":{"cf":"person","col":"middleName","type":"string"},
"addressLine":{"cf":"address","col":"addressLine","type":"string"},
"city":{"cf":"address","col":"city","type":"string"},
"state":{"cf":"address","col":"state","type":"string"},
"zipCode":{"cf":"address","col":"zipCode","type":"string"}
}
}'''.split())
#Writing
print("Writing into HBase")
df.write\
.options(catalog=catalog)\
.format(data_source_format)\
.save()
#Reading
print("Readig from HBase")
df = sqlc.read\
.options(catalog=catalog)\
.format(data_source_format)\
.load()
print("Program Ends")
错误信息:
写入 HBase Traceback(最后一次调用):文件“/mnt/c/Codefiles/pyspark_test.py”,第 36 行,在 df.write
文件“/home/aditya/spark-3.1.2-bin-hadoop2.7 /python/lib/pyspark.zip/pyspark/sql/readwriter.py”,第 1107 行,在保存文件“/home/aditya/spark-3.1.2-bin-hadoop2.7/python/lib/py4j-0.10. 9- src.zip/py4j/java_gateway.py",第 1304 行,调用中
文件“/home/aditya/spark-3.1.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py”,第 111 行,在 deco 文件“/home/aditya/spark- 3.1.2-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py",第 326 行,在 get_return_value py4j.protocol.Py4JJavaError:调用 o63.save 时出错. : java.lang.NoClassDefFoundError: org/apache/spark/Logging at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
解决方案
推荐阅读
- scala - 如何在 Scala 中的布尔值上定义我自己的逻辑运算符
- ibm-cloud - 如何训练 IBM Watson Assistant 从特定数据集(比如电子书)中回答问题?
- html - Aria 角色和内部 div
- javascript - 是否可以使用 CSS translateX(100%) 制作移动动画?
- kotlin - 是否可以在 Kotlin 中通过反射访问内部类?
- sql - 大小请求mysql
- c# - 由于插件版本众多,无法使用统一广告
- python-3.x - 创建删除功能时遇到问题
- c++ - 如何指定要在 CMAKE 中读取的 CSV 文件的路径
- html - 线内输入激活