apache-spark - 如何解决 Elephas 基本示例上的 java 错误?
问题描述
我使用 PySpark 和 Elephas,但目前无法正常工作。我尝试了 Elephas 的 doc Github 上给出的示例。请注意,在 PySpark 控制台中,我的 Keras 和 Pandas 代码可以工作(但不使用 PySpark 库)。但是https://github.com/maxpumperla/elephas上给出的将 Keras 和 PySpark 库与 Elephas 接口的示例不起作用,我根本不知道如何解决这个问题。我所有的 PySpark 配置都使用 Python 3.7
这是我的脚本的内容和错误消息:
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName('Elephas_App').setMaster('local[4]')
#ici local[4] indique qu'on execute l'application Elephas_app sur la machine locale seule et avec 4 coeurs
sc = SparkContext(conf=conf)
#Chargement des packages
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(128, input_dim=784))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=SGD())
##INTEGRATION ELEPHAS
from elephas.utils.rdd_utils import to_simple_rdd
rdd = to_simple_rdd(sc, x_train, y_train)
from elephas.spark_model import SparkModel
spark_model = SparkModel(model, frequency='epoch', mode='asynchronous')
spark_model.fit(rdd, epochs=20, batch_size=32, verbose=0, validation_split=0.1)
和错误信息:
>>> Distribute load
Traceback (most recent call last):
File "/home/admin-tv/deeplearning/elephas_ann.py", line 100, in <module>
spark_model.fit(rdd, epochs=20, batch_size=32, verbose=0, validation_split=0.1)
File "/usr/local/lib/python3.7/dist-packages/elephas/spark_model.py", line 151, in fit
self._fit(rdd, epochs, batch_size, verbose, validation_split)
File "/usr/local/lib/python3.7/dist-packages/elephas/spark_model.py", line 182, in _fit
rdd.mapPartitions(worker.train).collect()
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 816, in collect
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.IllegalArgumentException: Unsupported class file major version 55
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148)
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:237)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:49)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:517)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:500)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:500)
at org.apache.xbean.asm6.ClassReader.readCode(ClassReader.java:2175)
at org.apache.xbean.asm6.ClassReader.readMethod(ClassReader.java:1238)
at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:631)
at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:355)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:307)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:306)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:306)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2100)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:990)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.RDD.collect(RDD.scala:989)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
解决方案
经过一番研究,我切换到 java 8 并删除了我的 java 11 安装。然后,我手动重写了python2.7下的所有安装。现在我认为它有效。我还必须更好地调整脚本以适应我的 x_train 和 y_train。我使用了 keras 的 predict() 函数来得到一个我认为是一致的数组。
Java 11 不适用于 Spark 2.4,显然它适用于 PySpark 3,请查看。
推荐阅读
- microsoft-graph-api - 是否可以在 MS Exchange 365 处理电子邮件标题之前更改它?
- c++ - 选择操作类型时如何使用枚举
- javascript - Bing Map API - 使用两个地址计算距离
- javascript - Lighthouse CI 似乎没有使用 Puppeteer 身份验证脚本设置的 localStorage 令牌
- c - 如何在 XCode 项目中获取文件的访问控制列表(POSIX ACL)
- css - 找不到模块:错误:无法解析 Vue 组件中的 '../fonts/MaterialIcons-Regular.eot
- r - 使用 R 在 ggplot 上添加多元线性回归线
- java - 防止在 xsl 处理中解析 html 实体
- user-interface - Vuetify 中的 v-form 和 v-card 有什么区别?
- java - 宽度和高度为 match_parent 的 Recyclerview 返回固定大小错误