python - SparkError:PickleException:构造 ClassDict 的预期参数为零。数据砖
问题描述
我使用将 Pandas DataFrame 转换为 Spark Dataframe
# adapted from https://stackoverflow.com/questions/37513355/converting-pandas-dataframe-into-spark-dataframe-error
# Auxiliary functions
def equivalent_type(f):
if f == 'datetime64[ns]': return TimestampType()
elif f == 'int64': return LongType()
elif f == 'int32': return IntegerType()
elif f == 'float64': return FloatType()
elif f == 'bool': return BooleanType()
else: return StringType()
def define_structure(string, format_type):
try: typo = equivalent_type(format_type)
except: typo = StringType()
return StructField(string, typo)
# Given pandas dataframe, it will return a spark's dataframe.
def pandas_to_spark(pandas_df):
columns = list(pandas_df.columns)
types = list(pandas_df.dtypes)
struct_list = []
for column, typo in zip(columns, types):
struct_list.append(define_structure(column, typo))
p_schema = StructType(struct_list)
return sqlContext.createDataFrame(pandas_df, p_schema)
sdf = pandas_to_spark(pdf)
我可以查看数据框display(sdf)
但是,如果我想执行调用collect()
操作,或者保存表:sdf.write.saveAsTable("TESTsdf")
我收到以下错误:
net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)
解决方案
推荐阅读
- angular - Oidc-client UserManager 未定义错误
- c# - C#列表未初始化
- sql-server - 检索没有主键但具有唯一字段的 CDC 净更改
- dll - PUNICODE_STRING 不适用于 winternl.h
- authentication - RabbitMQ - ACCESS_REFUSED - 登录被拒绝
- c - 如何理解 void (*action)(struct softirq_action *)
- sql - 跨年数的 SQL SUM 小时数
- python - Python 猴子补丁最佳实践
- c# - .net core web 不渲染 CSS
- spring-boot - Windows 上的 Spring Boot,临时上传位置无效