python - 在使用 Hadoop/Spark/Yarn 的 RaspberryPi 集群上的 Jupyter Notebook 中运行 PySpark 代码时出错
问题描述
我正在尝试使用 Yarn 集群在带有 PySpark 的 Jupyter Notebook 中运行示例代码。
我认为我的集群工作正常。我可以看到所有节点都在运行
yarn node -list -all
OpenJDK Client VM warning: You have loaded library /opt/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
pi2:40045 RUNNING pi2:8042 0
pi1:38067 RUNNING pi1:8042 0
pi3:35139 RUNNING pi3:8042 0
我正在使用此处提供的示例笔记本:https ://github.com/dmanning21h/pi-cluster/blob/master/notebooks/picluster.ipynb
我可以运行除此单元之外的所有笔记本代码:
# Show label counts by class
df.groupBy("label") \
.count() \
.orderBy(col("count").desc()) \
.show()
Spark 卡在这项工作上,并出现以下错误:
Exception in thread "map-output-dispatcher-0" java.lang.UnsatisfiedLinkError: /opt/hadoop/lib/native/libzstd-jni.so: /opt/hadoop/lib/native/libzstd-jni.so: wrong ELF class: ELFCLASS64 (Possible cause: architecture word width mismatch)
Unsupported OS/arch, cannot find /linux/arm/libzstd-jni.so or load zstd-jni from system libraries. Please try building from source the jar or providing libzstd-jni in your system.
at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)
at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2442)
at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2498)
at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2694)
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2659)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1873)
at com.github.luben.zstd.util.Native.load(Native.java:73)
at com.github.luben.zstd.util.Native.load(Native.java:60)
at com.github.luben.zstd.ZstdOutputStream.<clinit>(ZstdOutputStream.java:15)
at org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:224)
at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:913)
at org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:210)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72)
at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:207)
at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:457)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Exception in thread "map-output-dispatcher-1" java.lang.NoClassDefFoundError: Could not initialize class com.github.luben.zstd.ZstdOutputStream
at org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:224)
at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:913)
at org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:210)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72)
at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:207)
at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:457)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
我也尝试过此评论的解决方案,但它不起作用。
解决方案
推荐阅读
- java - Spring Data JPA 无法实例化持久化程序 org.hibernate.persister.entity.JoinedSubclassEntityPersister
- html - Angular 4:只要表单未验证,就禁用按钮
- java - RX Java - subscribeOn 和 observeOn 的不同行为
- c# - Bacnet/IP 与 .NET
- java - 注释的 AOP 在 Spring Boot 中不起作用
- html - 与媒体查询相同的边距
- ruby-on-rails - Rails,请求的资源上不存在“Access-Control-Allow-Origin”标头
- python - Tkinter 按钮不执行命令
- jasmine - 无法获取字段形式的 SVG 对象并移动它们
- java - 在 sqlite 数据库中搜索数据时出错