python - Python - 管理嵌套的 JSON 到 DataFrame
问题描述
我正处于我的数据科学家之旅的开始阶段,我正在努力使用 JSON 和 Python。即使我知道 DataFrame 操作和 JSON 格式操作的基础知识,我也只是得到了一个包含以下数据的 JSON 文件:
{"Orders":
[
{
"OrderID":"1000004209",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"CR",
"OrderDate":"2019-05-02 14:05:16",
"OrderStatus":"wc-failed",
"OrderTotal":"31.90",
"TotalDiscount":"0",
"OrderSubTotal":"31.9",
"Coupon":"",
"OrderItems": {
"Item": {
"ProductName":"Eau de Parfum Zafferano",
"Sku":"44160",
"Quantity":"1",
"ItemCost":"27.00",
"ItemTotal":"27",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
}
}
},
{
"OrderID":"1000004210",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"GE",
"OrderDate":"2019-05-02 14:17:32",
"OrderStatus":"wc-cancelled",
"OrderTotal":"9.00",
"TotalDiscount":"0",
"OrderSubTotal":"9",
"Coupon":"",
"OrderItems": {
"Item": {
"ProductName":"Sapone Marsiglia 200 g",
"Sku":"01026",
"Quantity":"1",
"ItemCost":"4.10",
"ItemTotal":"4.1",
"Category":"MARSEILLE;SAPONETTE"
}
}
},
{
"OrderID":"1000004211",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"GE",
"OrderDate":"2019-05-02 14:21:42",
"OrderStatus":"wc-cancelled",
"OrderTotal":"31.90",
"TotalDiscount":"0",
"OrderSubTotal":"31.9",
"Coupon":"",
"OrderItems": {
"Item": {
"ProductName":"Eau de Parfum Zafferano",
"Sku":"44160",
"Quantity":"1",
"ItemCost":"27.00",
"ItemTotal":"27",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
}
}
},
{
"OrderID":"1000004235",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"CR",
"OrderDate":"2019-05-03 09:37:06",
"OrderStatus":"wc-cancelled",
"OrderTotal":"31.90",
"TotalDiscount":"0",
"OrderSubTotal":"31.9",
"Coupon":"",
"OrderItems": {
"Item": [
{
"ProductName":"Eau de Parfum Zafferano",
"Sku":"44160",
"Quantity":"1",
"ItemCost":"27.00",
"ItemTotal":"27",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
},
{
"ProductName":"Sapone Vegetale Lavanda Officinalis Bio",
"Sku":"01049",
"Quantity":"1",
"ItemCost":"4.90",
"ItemTotal":"4.9",
"Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
}
]
}
},
{
"OrderID":"1000004292",
"Email":"name@mail.com",
"AnnoNascita":"-",
"Age":"-",
"Gender":"-",
"Provincia":"CR",
"OrderDate":"2019-05-06 08:52:47",
"OrderStatus":"wc-failed",
"OrderTotal":"64.90",
"TotalDiscount":"0",
"OrderSubTotal":"64.9",
"Coupon":"",
"OrderItems": {
"Item": [
{
"ProductName":"Schiuma da Barba Pour Homme",
"Sku":"45396",
"Quantity":"2",
"ItemCost":"12.00",
"ItemTotal":"24",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
},
{
"ProductName":"Detergente Intimo Delicato Mamma",
"Sku":"38420",
"Quantity":"1",
"ItemCost":"11.00",
"ItemTotal":"11",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
},
{
"ProductName":"Spray per Ambiente - Preziosa",
"Sku":"44231",
"Quantity":"2",
"ItemCost":"10.00",
"ItemTotal":"20",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
},
{
"ProductName":"Cuscinetti Profumati - Preziosa",
"Sku":"45491",
"Quantity":"1",
"ItemCost":"9.90",
"ItemTotal":"9.9",
"Category":"POUR HOMME;LINEE UOMO;RASATURA"
}
]
}
},
]
}
我想做的是用 Pandas 构建一个 DataFrame 来操作数据和收集信息。
起初,我尝试使用pd.read_json('path_to_file')
Pandas 中的函数,但得到了以下结果:
Orders
0 {'OrderID': '1000004209', 'Email': 'name@ma...
1 {'OrderID': '1000004210', 'Email': 'name@ma...
2 {'OrderID': '1000004211', 'Email': 'name@ma...
3 {'OrderID': '1000004235', 'Email': 'name@ma...
4 {'OrderID': '1000004292', 'Email': 'name@ma...
我尝试使用从每一行获取一个 DatFrame pd.DataFrame(df['Orders'])
,但它返回相同的 DataFrame。我尝试使用 for 循环在新的 DataFrame 中附加单行,但我也走到了一条死胡同。
在 StackOverflow 上查看与此相关的所有主题时,我非常迷茫,却没有找到解决问题的方法。
实际上,我需要创建一个 DataFrame,其中每个主要值都有一个列(如“OrderID”、“Email”、“AnnoNascita”等)和一个名为“OrderItems”的列,其中包含“Item”中的所有值的数组”。我在想类似的事情:
OrderID Email AnnoNascita Age Gender Provincia OrderDate OrderStatus OrderTotal Coupon OrderItems
0 1000004209 name@mail.com - - - CR 2019-05-02 14:05:16 wc-failed 31.90 {"Items":[{"ProductName":"Schiuma da Barba Pour Homme","Sku":"45396","Quantity":"2","ItemCost":"12.00","ItemTotal":"24","Category":"POUR HOMME;LINEE UOMO;RASATURA"},{"ProductName":"Detergente Intimo Delicato Mamma","Sku":"38420","Quantity":"1","ItemCost":"11.00","ItemTotal":"11","Category":"POUR HOMME;LINEE UOMO;RASATURA"}]}
如果您对如何构建更好的 DataFrame 而不是我的建议有任何建议,我很高兴阅读并改变主意。正如我所说,我刚开始,我真的很感激任何建议。
PS:如果您也可以解释您提供的解决方案,我会非常高兴,因为我实际上正在尝试学习数据操作,不仅有解决方案而且理解它真的很有帮助。
感谢所有愿意花时间帮助我的人!
解决方案
使用pd.json_normalize()
,如下:
假设您的 json 文件名为data
:
df = pd.json_normalize(data['Orders'])
pd.json_normalize()
将半结构化 JSON 数据规范化为平面表。由于它展平了嵌套结构,您可以访问 JSON 中的所有字段。
您需要指定第一个标签Orders
才能访问和展开其中的列内容。否则,您只会得到一个 column Orders
。
结果:
print(df)
OrderID Email AnnoNascita Age Gender Provincia OrderDate OrderStatus OrderTotal TotalDiscount OrderSubTotal Coupon OrderItems.Item.ProductName OrderItems.Item.Sku OrderItems.Item.Quantity OrderItems.Item.ItemCost OrderItems.Item.ItemTotal OrderItems.Item.Category OrderItems.Item
0 1000004209 name@mail.com - - - CR 2019-05-02 14:05:16 wc-failed 31.90 0 31.9 Eau de Parfum Zafferano 44160 1 27.00 27 ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO) NaN
1 1000004210 name@mail.com - - - GE 2019-05-02 14:17:32 wc-cancelled 9.00 0 9 Sapone Marsiglia 200 g 01026 1 4.10 4.1 MARSEILLE;SAPONETTE NaN
2 1000004211 name@mail.com - - - GE 2019-05-02 14:21:42 wc-cancelled 31.90 0 31.9 Eau de Parfum Zafferano 44160 1 27.00 27 ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO) NaN
3 1000004235 name@mail.com - - - CR 2019-05-03 09:37:06 wc-cancelled 31.90 0 31.9 NaN NaN NaN NaN NaN NaN [{'ProductName': 'Eau de Parfum Zafferano', 'Sku': '44160', 'Quantity': '1', 'ItemCost': '27.00', 'ItemTotal': '27', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}, {'ProductName': 'Sapone Vegetale Lavanda Officinalis Bio', 'Sku': '01049', 'Quantity': '1', 'ItemCost': '4.90', 'ItemTotal': '4.9', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}]
4 1000004292 name@mail.com - - - CR 2019-05-06 08:52:47 wc-failed 64.90 0 64.9 NaN NaN NaN NaN NaN NaN [{'ProductName': 'Schiuma da Barba Pour Homme', 'Sku': '45396', 'Quantity': '2', 'ItemCost': '12.00', 'ItemTotal': '24', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Detergente Intimo Delicato Mamma', 'Sku': '38420', 'Quantity': '1', 'ItemCost': '11.00', 'ItemTotal': '11', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Spray per Ambiente - Preziosa', 'Sku': '44231', 'Quantity': '2', 'ItemCost': '10.00', 'ItemTotal': '20', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Cuscinetti Profumati - Preziosa', 'Sku': '45491', 'Quantity': '1', 'ItemCost': '9.90', 'ItemTotal': '9.9', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}]
推荐阅读
- maven - 荆棘跑目标
- android - NullPointerException from Firebase Messaging in Google Play Console Reports
- python - While executing the below script I get the error "'chromedriver' executable needs to be in PATH"
- css - How to select in css 1st 4th 5th or 2nd 3rd 6th element
- node.js - 尝试在节点中呈现 JSX 时出现意外令牌
- react-native - TouchableNativeFeedback onLongPress radius not affected
- c# - How to fill the rectangle with some colors, using gradient but only as a tool(without visible gradient)?
- angular - Angular 6 Reactive Form:如何修复 TypeError:formDirective.resetForm() 不是函数
- xamarin.forms - FFImageloading - 检查图像是否存在于缓存中?
- salesforce - 在 Salesforce 相关内容(如联系人)中设置自定义字段(或)查找对象的样式