首页 > 解决方案 > 有没有办法将数组格式的字典列表转换为数据框中的单个列?

问题描述

我不能将 Pyspark 用作仅供参考!

我的数据如下所示:

0   { "CountryOfManufacture": "China", "Tags": ["U...
1   { "CountryOfManufacture": "China", "Tags": ["U...
2   { "CountryOfManufacture": "China", "Tags": [] }
3   { "CountryOfManufacture": "Japan", "Tags": ["3...
4   { "CountryOfManufacture": "Japan", "Tags": ["1...
... ...
222 { "CountryOfManufacture": "USA", "ShelfLife": ...
223 { "CountryOfManufacture": "USA", "ShelfLife": ...
224 { "CountryOfManufacture": "USA", "ShelfLife": ...
225 { "CountryOfManufacture": "USA", "ShelfLife": ...
226 { "CountryOfManufacture": "USA", "ShelfLife": .

因此字典中包含不同的值。我只对第一个(制造国)感兴趣,并想将其拆分,然后添加到另一个数据框。

谢谢!

标签: pythondataframedictionary

解决方案


当我尝试使用 from_records 我的结果如下所示:

                                        CustomFields
0  { "CountryOfManufacture": "China", "Tags": ["U...
1  { "CountryOfManufacture": "China", "Tags": ["U...
2    { "CountryOfManufacture": "China", "Tags": [] }
3  { "CountryOfManufacture": "Japan", "Tags": ["3...
4  { "CountryOfManufacture": "Japan", "Tags": ["1...

我认为这是因为我的数据格式不寻常。我的数据最初是在 CSV 文件中提供的,这是其中的一列。所有其他列都是整数/浮点数/对象格式,而当您在 Excel 中查看时,该列已经是字典格式。

您在下面的示例中使用的数据按照我的预期进行了格式化,但这是我转换为列表时的样子:

['{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": [] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": ["Comedy"] }', ...

正如你所看到的,我在每个字典列表之外都有额外的引号,这里用一行来说明:['{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }'。

有没有办法在没有 pyspark 的情况下解决这个问题?

谢谢!


推荐阅读