python - TypeError: element in array field Category: Can not merge type and
问题描述
I am reading the csv file using Pandas, it's a two column dataframe, and then I am trying to convert to the spark dataframe. The Code for this is:
from pyspark.sql import SQLContext
sqlCtx = SQLContext(sc)
sdf = sqlCtx.createDataFrame(df)
The dataframe:
print(df)
gives this :
Name Category
0 EDSJOBLIST apply at www.edsjoblist.com ['biotechnology', 'clinical', 'diagnostic', 'd...
1 Power Direct Marketing ['advertising', 'analytics', 'brand positionin...
2 CHA Hollywood Medical Center, L.P. ['general medical and surgical hospital', 'hea...
3 JING JING GOURMET [nan]
4 TRUE LIFE KINGDOM MINISTRIES ['religious organization']
5 fasterproms ['microsoft .net']
6 STEREO ZONE ['accessory', 'audio', 'car audio', 'chrome', ...
7 SAN FRANCISCO NEUROLOGICAL SOCIETY [nan]
8 Fl Advisors ['comprehensive financial planning', 'financia...
9 Fortunatus LLC ['bottle', 'bottling', 'charitable', 'dna', 'f...
10 TREADS LLC ['retail', 'wholesaling']
Can anyone help me with this ?
解决方案
Spark can have difficulty dealing with object
datatypes. A potential workaround is to convert everything to a string first:
sdf = sqlCtx.createDataFrame(df.astype(str))
One consequence of this is that everything, including nan
will be converted to string. You will need to take care to properly handle these conversions and cast the columns to the appropriate type.
For instance, if you had a column "colA"
with floating point values, you can use something like the following to convert the string "nan"
to a null
:
from pyspark.sql.functions import col, when
sdf = sdf.withColumn("colA", when(col("colA") != "nan", col("colA").cast("float")))
推荐阅读
- jquery - 在 ASP.NET MVC 上使用 Kendo Grid 更改数据
- python - python打印函数的非常奇怪的行为:打印以某种方式改变了函数的结果
- reactjs - 将子组件值传回父组件
- kubernetes - 卸载rabbitmq并通过相同的bitnami helm脚本pod重新安装K8S无法启动
- entity-framework - 将 EF6 与 .Net 5 一起使用
- r - 考克斯模型森林图
- c# - 滚动到底部时,ListView 会自动添加更多内容
- huawei-mobile-services - IAP 回调地址
- r - devtools::check() - 没有名为“Matrix”的包
- flutter - ListView.builder 到达列表末尾并向后滚动后返回起始位置