首页 > 解决方案 > 几个具有重复键但值不同且列中没有限制的字典

问题描述

这里的数据集在字典中具有无限键。行中的明细栏可能有不同的信息产品,具体取决于客户。

ID  Name    Detail
1   Sara    [{"Personal":{"ID":"001","Name":"Sara","Type":"01","TypeName":"Book"},"Order":[{"ID":"0001","Date":"20200222","ProductID":"C0123","ProductName":"ABC", "Price":"4"}]},{"Personal":{"ID":"001","Name":"Sara","Type":"02","TypeName":"Food"},"Order":[{"ID":"0004","Date":"20200222","ProductID":"D0123","ProductName":"Small beef", "Price":"15"}]},{"Personal":{"ID":"001","Name":"Sara","Type":"02","TypeName":"Food"},"Order":[{"ID":"0005","Date":"20200222","ProductID":"D0200","ProductName":"Shrimp", "Price":"28"}]}]
2   Frank   [{"Personal":{"ID":"002","Name":"Frank","Type":"02","TypeName":"Food"},"Order":[{"ID":"0008","Date":"20200228","ProductID":"D0288","ProductName":"Salmon", "Price":"24"}]}]

我的预期输出是

ID Name Personal_ID Personal_Name Personal_Type Personal_TypeName Personal_Order_ID Personal_Order_Date Personal_Order_ProductID Personal_Order_ProductName Personal_Order_Price    
1  Sara 001         Sara          01            Book              0001              20200222            C0123                    ABC                          4    
2  Sara 001         Sara          02            Food              0004              20200222            D0123                    Small beef                   15
3  Sara 001         Sara          02            Food              0005              20200222            D0200                    Shrimp                       28
4  Frank 002        Frank         02            Food              0008              20200228            D0288                    Salmon                       24

标签: pythonpandas

解决方案


所以基本上你的细节列中有一个嵌套的 JSON,你需要将它分解成一个 df 然后与你的原始合并。

import pandas as pd
import json
from pandas import json_normalize

#create empty df to hold the detail information
detailDf = pd.DataFrame()
#We will need to loop over each row to read each JSON
for ind, row in df.iterrows():
    #Read the json, make it a DF, then append the information to the empty DF
    detailDf = detailDf.append(json_normalize(json.loads(row['Detail']), record_path = ('Order'), meta = [['Personal','ID'], ['Personal','Name'], ['Personal','Type'],['Personal','TypeName']]))

# Personally, you don't really need the rest of the code, as the columns Personal.Name
# and Personal.ID is the same information, but none the less.

# You will have to merge on name and ID
df = df.merge(detailDf, how = 'right', left_on = [df['Name'], df['ID']], right_on = [detailDf['Personal.Name'], detailDf['Personal.ID'].astype(int)])

#Clean up
df.rename(columns = {'ID_x':'ID', 'ID_y':'Personal_Order_ID'}, inplace = True)
df.drop(columns = {'Detail', 'key_1', 'key_0'}, inplace = True)

如果您查看我的评论,我建议detailDf您将其用作最终 df,因为确实不需要合并,并且该信息已经在 Detail JSON 中。


推荐阅读