首页 > 解决方案 > SparkError:PickleException:构造 ClassDict 的预期参数为零。数据砖

问题描述

我使用将 Pandas DataFrame 转换为 Spark Dataframe

# adapted from https://stackoverflow.com/questions/37513355/converting-pandas-dataframe-into-spark-dataframe-error
# Auxiliary functions
def equivalent_type(f):
    if f == 'datetime64[ns]': return TimestampType()
    elif f == 'int64': return LongType()
    elif f == 'int32': return IntegerType()
    elif f == 'float64': return FloatType()
    elif f == 'bool': return BooleanType()
    else: return StringType()

def define_structure(string, format_type):
    try: typo = equivalent_type(format_type)
    except: typo = StringType()
    return StructField(string, typo)

# Given pandas dataframe, it will return a spark's dataframe.
def pandas_to_spark(pandas_df):
    columns = list(pandas_df.columns)
    types = list(pandas_df.dtypes)
    struct_list = []
    for column, typo in zip(columns, types): 
      struct_list.append(define_structure(column, typo))
    p_schema = StructType(struct_list)
    return sqlContext.createDataFrame(pandas_df, p_schema)

sdf = pandas_to_spark(pdf)

我可以查看数据框display(sdf)

但是,如果我想执行调用collect() 操作,或者保存表:sdf.write.saveAsTable("TESTsdf")

我收到以下错误:

net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)

标签: pythonpandasapache-sparkpysparkdatabricks

解决方案


推荐阅读