首页 > 解决方案 > 使用python将pickle(.pck)文件转换为spark数据帧

问题描述

你好! 亲爱的成员我​​想使用 Bigdl 训练模型,我有泡菜对象文件(,pck)形式的医学图像数据集。泡菜文件是 3D 图像(3D 数组)

我试图通过使用 BigDl python API 将其转换为 spark 数据帧

 pickleRdd = sc.pickleFilehome/student/BigDL- 
 trainings/elephantscale/data/volumetric_data/329637-8.pck
 sqlContext = SQLContext(sc)
 df = sqlContext.createDataFrame(pickleRdd) 

它抛出错误

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost, executor driver)
: java.io.IOException: file:/home/student/BigDL-trainings/elephantscale/data/volumetric_data/329637-8.pck not a SequenceFile

在这两种情况下,我都在 python 3.5 和 2.7 上执行了这段代码,我得到了错误

标签: apache-sparkbigdl

解决方案


推荐阅读