首页 > 解决方案 > 将 csv 数据框按摩成字典样式

问题描述

我有一个来自 API 调用的名为tshirt_orders的熊猫数据框,如下所示:

Alice, small, red  
Alice, small, green  
Bob, small, blue  
Bob, small, orange  
Cesar, medium, yellow  
David, large, purple  

如何将其转换为字典样式格式,其中我首先按大小排列,并在名称下有子键,在颜色下有另一个子列表,以便在使用 tshirt_orders 迭代时可以解决它?

像这样:

size:
        small:
            Name:
              Alice:
                 Color:
                    red
                    green
              Bob:
                 Color:
                    blue
                    orange
         medium:
             Name:
               Cesar:
                    Color:
                        yellow
          large:
               Name:
                  David:
                     Color:
                        purple

改变这种情况的最佳解决方案是什么?它在 pandas 数据框中,但如果有更好的解决方案,改变它不是问题。

标签: python-3.xpandasdictionary

解决方案


关闭是将 DataFrame 写入yaml.

首先在字典理解中创建嵌套字典:

print (df)
       A       B       C
0  Alice   small     red
1  Alice   small   green
2    Bob   small    blue
3    Bob   small  orange
4  Cesar  medium  yellow
5  David   large  purple

d = {k:v.groupby('A', sort=False)['C'].apply(list).to_dict() 
      for k, v in df.groupby('B', sort=False)}
print (d)
{'small': {'Alice': ['red', 'green'], 
           'Bob': ['blue', 'orange']}, 
'medium': {'Cesar': ['yellow']}, 
'large': {'David': ['purple']}}

添加size到字典的键,然后写入yaml文件:

import yaml
with open('result.yml', 'w') as yaml_file:
    yaml.dump({'size': d}, yaml_file, default_flow_style=False, sort_keys=False)

size:
  small:
    Alice:
    - red
    - green
    Bob:
    - blue
    - orange
  medium:
    Cesar:
    - yellow
  large:
    David:
    - purple

或者创建json:

import json

with open("result.json", "w") as twitter_data_file:
    json.dump({'size': d}, twitter_data_file, indent=4)

{
    "size": {
        "small": {
            "Alice": [
                "red",
                "green"
            ],
            "Bob": [
                "blue",
                "orange"
            ]
        },
        "medium": {
            "Cesar": [
                "yellow"
            ]
        },
        "large": {
            "David": [
                "purple"
            ]
        }
    }
}

编辑:

df = df.assign(A1='Name', B1='size', C1='Color')

df1 = df.groupby(['B1','B','A1','A','C1'], sort=False)['C'].apply(list).reset_index()

#https://stackoverflow.com/a/19900276
def recur_dictify(frame):
    if len(frame.columns) == 1:
        if frame.values.size == 1: return frame.values[0][0]
        return frame.values.squeeze()
    grouped = frame.groupby(frame.columns[0], sort=False)
    d = {k: recur_dictify(g.iloc[:,1:]) for k,g in grouped}
    return d

d = recur_dictify(df1)
print (d)
{'size': {'small': {'Name': {'Alice': {'Color': ['red', 'green']}, 
                             'Bob': {'Color': ['blue', 'orange']}}}, 
         'medium': {'Name': {'Cesar': {'Color': ['yellow']}}}, 
         'large': {'Name': {'David': {'Color': ['purple']}}}}}

import yaml
with open('result.yml', 'w') as yaml_file:
    yaml.dump(d, yaml_file, default_flow_style=False, sort_keys=False)

推荐阅读