首页 > 解决方案 > Python - 管理嵌套的 JSON 到 DataFrame

问题描述

我正处于我的数据科学家之旅的开始阶段,我正在努力使用 JSON 和 Python。即使我知道 DataFrame 操作和 JSON 格式操作的基础知识,我也只是得到了一个包含以下数据的 JSON 文件:

{"Orders":
    [
        {
            "OrderID":"1000004209",
            "Email":"name@mail.com",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"CR",
            "OrderDate":"2019-05-02 14:05:16",
            "OrderStatus":"wc-failed",
            "OrderTotal":"31.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"31.9",
            "Coupon":"",
            "OrderItems": {
                "Item": {
                    "ProductName":"Eau de Parfum Zafferano",
                    "Sku":"44160",
                    "Quantity":"1",
                    "ItemCost":"27.00",
                    "ItemTotal":"27",
                    "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                }
            }
        },
        {
            "OrderID":"1000004210",
            "Email":"name@mail.com",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"GE",
            "OrderDate":"2019-05-02 14:17:32",
            "OrderStatus":"wc-cancelled",
            "OrderTotal":"9.00",
            "TotalDiscount":"0",
            "OrderSubTotal":"9",
            "Coupon":"",
            "OrderItems": {
                "Item": {
                    "ProductName":"Sapone Marsiglia 200 g",
                    "Sku":"01026",
                    "Quantity":"1",
                    "ItemCost":"4.10",
                    "ItemTotal":"4.1",
                    "Category":"MARSEILLE;SAPONETTE"
                }
            }
        },
        {
            "OrderID":"1000004211",
            "Email":"name@mail.com",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"GE",
            "OrderDate":"2019-05-02 14:21:42",
            "OrderStatus":"wc-cancelled",
            "OrderTotal":"31.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"31.9",
            "Coupon":"",
            "OrderItems": {
                "Item": {
                    "ProductName":"Eau de Parfum Zafferano",
                    "Sku":"44160",
                    "Quantity":"1",
                    "ItemCost":"27.00",
                    "ItemTotal":"27",
                    "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                }
            }
        },
        {
            "OrderID":"1000004235",
            "Email":"name@mail.com",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"CR",
            "OrderDate":"2019-05-03 09:37:06",
            "OrderStatus":"wc-cancelled",
            "OrderTotal":"31.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"31.9",
            "Coupon":"",
            "OrderItems": {
                "Item": [
                    {
                        "ProductName":"Eau de Parfum Zafferano",
                        "Sku":"44160",
                        "Quantity":"1",
                        "ItemCost":"27.00",
                        "ItemTotal":"27",
                        "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                    },
                    {
                        "ProductName":"Sapone Vegetale Lavanda Officinalis Bio",
                        "Sku":"01049",
                        "Quantity":"1",
                        "ItemCost":"4.90",
                        "ItemTotal":"4.9",
                        "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                    }
                ]
            }
        },
        {
            "OrderID":"1000004292",
            "Email":"name@mail.com",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"CR",
            "OrderDate":"2019-05-06 08:52:47",
            "OrderStatus":"wc-failed",
            "OrderTotal":"64.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"64.9",
            "Coupon":"",
            "OrderItems": {
                "Item": [
                    {
                        "ProductName":"Schiuma da Barba Pour Homme",
                        "Sku":"45396",
                        "Quantity":"2",
                        "ItemCost":"12.00",
                        "ItemTotal":"24",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    },
                    {
                        "ProductName":"Detergente Intimo Delicato Mamma",
                        "Sku":"38420",
                        "Quantity":"1",
                        "ItemCost":"11.00",
                        "ItemTotal":"11",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    },
                    {
                        "ProductName":"Spray per Ambiente - Preziosa",
                        "Sku":"44231",
                        "Quantity":"2",
                        "ItemCost":"10.00",
                        "ItemTotal":"20",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    },
                    {
                        "ProductName":"Cuscinetti Profumati - Preziosa",
                        "Sku":"45491",
                        "Quantity":"1",
                        "ItemCost":"9.90",
                        "ItemTotal":"9.9",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    }
                ]
            }
        },
    ]
}

我想做的是用 Pandas 构建一个 DataFrame 来操作数据和收集信息。

起初,我尝试使用pd.read_json('path_to_file')Pandas 中的函数,但得到了以下结果:

                                            Orders
0   {'OrderID': '1000004209', 'Email': 'name@ma...
1   {'OrderID': '1000004210', 'Email': 'name@ma...
2   {'OrderID': '1000004211', 'Email': 'name@ma...
3   {'OrderID': '1000004235', 'Email': 'name@ma...
4   {'OrderID': '1000004292', 'Email': 'name@ma...

我尝试使用从每一行获取一个 DatFrame pd.DataFrame(df['Orders']),但它返回相同的 DataFrame。我尝试使用 for 循环在新的 DataFrame 中附加单行,但我也走到了一条死胡同。

在 StackOverflow 上查看与此相关的所有主题时,我非常迷茫,却没有找到解决问题的方法。

实际上,我需要创建一个 DataFrame,其中每个主要值都有一个列(如“OrderID”、“Email”、“AnnoNascita”等)和一个名为“OrderItems”的列,其中包含“Item”中的所有值的数组”。我在想类似的事情:

        OrderID             Email   AnnoNascita   Age     Gender     Provincia               OrderDate     OrderStatus     OrderTotal     Coupon     OrderItems
0    1000004209     name@mail.com             -     -          -            CR     2019-05-02 14:05:16       wc-failed          31.90                {"Items":[{"ProductName":"Schiuma da Barba Pour Homme","Sku":"45396","Quantity":"2","ItemCost":"12.00","ItemTotal":"24","Category":"POUR HOMME;LINEE UOMO;RASATURA"},{"ProductName":"Detergente Intimo Delicato Mamma","Sku":"38420","Quantity":"1","ItemCost":"11.00","ItemTotal":"11","Category":"POUR HOMME;LINEE UOMO;RASATURA"}]}

如果您对如何构建更好的 DataFrame 而不是我的建议有任何建议,我很高兴阅读并改变主意。正如我所说,我刚开始,我真的很感激任何建议。

PS:如果您也可以解释您提供的解决方案,我会非常高兴,因为我实际上正在尝试学习数据操作,不仅有解决方案而且理解它真的很有帮助。

感谢所有愿意花时间帮助我的人!

标签: pythonjsonpandasdataframedata-analysis

解决方案


使用pd.json_normalize(),如下:

假设您的 json 文件名为data

df = pd.json_normalize(data['Orders'])

pd.json_normalize()将半结构化 JSON 数据规范化为平面表。由于它展平了嵌套结构,您可以访问 JSON 中的所有字段。

您需要指定第一个标签Orders才能访问和展开其中的列内容。否则,您只会得到一个 column Orders

结果:

print(df)

      OrderID          Email AnnoNascita Age Gender Provincia            OrderDate   OrderStatus OrderTotal TotalDiscount OrderSubTotal Coupon OrderItems.Item.ProductName OrderItems.Item.Sku OrderItems.Item.Quantity OrderItems.Item.ItemCost OrderItems.Item.ItemTotal                           OrderItems.Item.Category                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          OrderItems.Item
0  1000004209  name@mail.com           -   -      -        CR  2019-05-02 14:05:16     wc-failed      31.90             0          31.9            Eau de Parfum Zafferano               44160                        1                    27.00                        27  ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      NaN
1  1000004210  name@mail.com           -   -      -        GE  2019-05-02 14:17:32  wc-cancelled       9.00             0             9             Sapone Marsiglia 200 g               01026                        1                     4.10                       4.1                                MARSEILLE;SAPONETTE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      NaN
2  1000004211  name@mail.com           -   -      -        GE  2019-05-02 14:21:42  wc-cancelled      31.90             0          31.9            Eau de Parfum Zafferano               44160                        1                    27.00                        27  ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      NaN
3  1000004235  name@mail.com           -   -      -        CR  2019-05-03 09:37:06  wc-cancelled      31.90             0          31.9                                NaN                 NaN                      NaN                      NaN                       NaN                                                NaN                                                                                                                                                                                                                                                                                                             [{'ProductName': 'Eau de Parfum Zafferano', 'Sku': '44160', 'Quantity': '1', 'ItemCost': '27.00', 'ItemTotal': '27', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}, {'ProductName': 'Sapone Vegetale Lavanda Officinalis Bio', 'Sku': '01049', 'Quantity': '1', 'ItemCost': '4.90', 'ItemTotal': '4.9', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}]
4  1000004292  name@mail.com           -   -      -        CR  2019-05-06 08:52:47     wc-failed      64.90             0          64.9                                NaN                 NaN                      NaN                      NaN                       NaN                                                NaN  [{'ProductName': 'Schiuma da Barba Pour Homme', 'Sku': '45396', 'Quantity': '2', 'ItemCost': '12.00', 'ItemTotal': '24', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Detergente Intimo Delicato Mamma', 'Sku': '38420', 'Quantity': '1', 'ItemCost': '11.00', 'ItemTotal': '11', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Spray per Ambiente - Preziosa', 'Sku': '44231', 'Quantity': '2', 'ItemCost': '10.00', 'ItemTotal': '20', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Cuscinetti Profumati - Preziosa', 'Sku': '45491', 'Quantity': '1', 'ItemCost': '9.90', 'ItemTotal': '9.9', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}]


推荐阅读