python - 有没有办法将数组格式的字典列表转换为数据框中的单个列?
问题描述
我不能将 Pyspark 用作仅供参考!
我的数据如下所示:
0 { "CountryOfManufacture": "China", "Tags": ["U...
1 { "CountryOfManufacture": "China", "Tags": ["U...
2 { "CountryOfManufacture": "China", "Tags": [] }
3 { "CountryOfManufacture": "Japan", "Tags": ["3...
4 { "CountryOfManufacture": "Japan", "Tags": ["1...
... ...
222 { "CountryOfManufacture": "USA", "ShelfLife": ...
223 { "CountryOfManufacture": "USA", "ShelfLife": ...
224 { "CountryOfManufacture": "USA", "ShelfLife": ...
225 { "CountryOfManufacture": "USA", "ShelfLife": ...
226 { "CountryOfManufacture": "USA", "ShelfLife": .
因此字典中包含不同的值。我只对第一个(制造国)感兴趣,并想将其拆分,然后添加到另一个数据框。
谢谢!
解决方案
当我尝试使用 from_records 我的结果如下所示:
CustomFields
0 { "CountryOfManufacture": "China", "Tags": ["U...
1 { "CountryOfManufacture": "China", "Tags": ["U...
2 { "CountryOfManufacture": "China", "Tags": [] }
3 { "CountryOfManufacture": "Japan", "Tags": ["3...
4 { "CountryOfManufacture": "Japan", "Tags": ["1...
我认为这是因为我的数据格式不寻常。我的数据最初是在 CSV 文件中提供的,这是其中的一列。所有其他列都是整数/浮点数/对象格式,而当您在 Excel 中查看时,该列已经是字典格式。
您在下面的示例中使用的数据按照我的预期进行了格式化,但这是我转换为列表时的样子:
['{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": [] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": ["Comedy"] }', ...
正如你所看到的,我在每个字典列表之外都有额外的引号,这里用一行来说明:['{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }'。
有没有办法在没有 pyspark 的情况下解决这个问题?
谢谢!
推荐阅读
- orders - Prestashop - 如何在 1.7.7.5 版后台的订单列表页面中显示全名
- c# - 动态规则引擎
- r - R包ergm中具有节点匹配的两个属性的同质性
- javascript - 如何在 javascript 中最有效地对规范化数据进行非规范化
- javascript - folder.getOwner 没有返回正确的值/谷歌脚本
- r - 固定效应模型:chol.default(mat[ok, ok]) 中的错误:# 阶的前导次要不是正定的
- python - 在python中绘制分布的分数
- python - 从元组列表中替换元组的元素 - Python
- go - 在 net.DialTCP 中指定本地 IP 地址抛出错误绑定:地址已在使用中
- c++ - MacOS gcc dylib 和 clang 应用程序 - 异常处理问题