首页 > 解决方案 > 在哪里合并 goup 以正确导出 json?

问题描述

我希望以特定格式将我的 DF 导出到 Json,但是我无法正确分组,而且我不知道在哪里包含它。

数据框:

  item_type  purch_price sale_price         city    location
0    Iphone         1200       1150          NaN         NaN
1  Computer          700        NaN  Los Angeles  1st street
2  Computer          700        NaN     San Jose  2nd street

当前代码:

import json
import pandas as pd

df = pd.read_csv(r'filepath', delimiter=';', header=0)

df = df.fillna('')

def shop_details(row):
    if row['city'] != '' and row['location'] !='':
        return [{'city': row['city'], 'location': row['location']}]
    else:
        return []

df['shop_details'] = df.apply(lambda row: shop_details(row), axis = 1)

df = df.drop(['city', 'location'], axis = 1)

def print_json(text):
    parsed = json.loads(text)
    print(json.dumps(parsed, indent=4, sort_keys=False))

print_json(df.to_json(orient='records'))

* 当前结果 *

[
    {
        "item_type": "Iphone",
        "purch_price": 1200,
        "sale_price": 1150.0,
        "shop_details": []
    },
    {
        "item_type": "Computer",
        "purch_price": 700,
        "sale_price": "",
        "shop_details": [
            {
                "city": "Los Angeles",
                "location": "1st Street"
            }
        ]
    },
    {
        "item_type": "Computer",
        "purch_price": 700,
        "sale_price": "",
        "shop_details": [
            {
                "city": "San Jose",
                "location": "2nd Street"
            }
        ]
    }
]

期望的输出:

[{
    "item_type": "Iphone",
    "purch_price": "1200",
    "sale_price": "1150",
    "shop_details": []
  },
  {
    "item_type": "Computer",
    "purch_price": "700",
    "sale_price": "600",
    "shop_details": [{
        "city": "Los Angeles",
        "location": "1st street"
      },
      {
        "city": "San Jose",
        "location": "2nd street"
      }
    ]
  }
]

标签: pythonjsonpandasdataframegroup-by

解决方案


删除列后,使用 groupby 和 'sum' 连接您的 dicts 列表(连接列表的另一个更简单的示例:link),然后重置索引:

df_grouped=df.groupby(by=['item_type','purch_price','sale_price']).agg({'shop_details':'sum'}).reset_index()

您的 print_json 的结果输出是:

[
{
    "item_type": "Computer",
    "purch_price": 700,
    "sale_price": "",
    "shop_details": [
        {
            "city": "'Los Angeles'",
            "location": "'1st street'"
        },
        {
            "city": "'San Jose'",
            "location": "'2nd street'"
        }
    ]
},
{
    "item_type": "Iphone",
    "purch_price": 1200,
    "sale_price": 1150.0,
    "shop_details": []
}
]

请注意,我不确定您 600 的销售价格来自何处,因为它不在您的原始数据框中。


推荐阅读