python - 几个具有重复键但值不同且列中没有限制的字典
问题描述
这里的数据集在字典中具有无限键。行中的明细栏可能有不同的信息产品,具体取决于客户。
ID Name Detail
1 Sara [{"Personal":{"ID":"001","Name":"Sara","Type":"01","TypeName":"Book"},"Order":[{"ID":"0001","Date":"20200222","ProductID":"C0123","ProductName":"ABC", "Price":"4"}]},{"Personal":{"ID":"001","Name":"Sara","Type":"02","TypeName":"Food"},"Order":[{"ID":"0004","Date":"20200222","ProductID":"D0123","ProductName":"Small beef", "Price":"15"}]},{"Personal":{"ID":"001","Name":"Sara","Type":"02","TypeName":"Food"},"Order":[{"ID":"0005","Date":"20200222","ProductID":"D0200","ProductName":"Shrimp", "Price":"28"}]}]
2 Frank [{"Personal":{"ID":"002","Name":"Frank","Type":"02","TypeName":"Food"},"Order":[{"ID":"0008","Date":"20200228","ProductID":"D0288","ProductName":"Salmon", "Price":"24"}]}]
我的预期输出是
ID Name Personal_ID Personal_Name Personal_Type Personal_TypeName Personal_Order_ID Personal_Order_Date Personal_Order_ProductID Personal_Order_ProductName Personal_Order_Price
1 Sara 001 Sara 01 Book 0001 20200222 C0123 ABC 4
2 Sara 001 Sara 02 Food 0004 20200222 D0123 Small beef 15
3 Sara 001 Sara 02 Food 0005 20200222 D0200 Shrimp 28
4 Frank 002 Frank 02 Food 0008 20200228 D0288 Salmon 24
解决方案
所以基本上你的细节列中有一个嵌套的 JSON,你需要将它分解成一个 df 然后与你的原始合并。
import pandas as pd
import json
from pandas import json_normalize
#create empty df to hold the detail information
detailDf = pd.DataFrame()
#We will need to loop over each row to read each JSON
for ind, row in df.iterrows():
#Read the json, make it a DF, then append the information to the empty DF
detailDf = detailDf.append(json_normalize(json.loads(row['Detail']), record_path = ('Order'), meta = [['Personal','ID'], ['Personal','Name'], ['Personal','Type'],['Personal','TypeName']]))
# Personally, you don't really need the rest of the code, as the columns Personal.Name
# and Personal.ID is the same information, but none the less.
# You will have to merge on name and ID
df = df.merge(detailDf, how = 'right', left_on = [df['Name'], df['ID']], right_on = [detailDf['Personal.Name'], detailDf['Personal.ID'].astype(int)])
#Clean up
df.rename(columns = {'ID_x':'ID', 'ID_y':'Personal_Order_ID'}, inplace = True)
df.drop(columns = {'Detail', 'key_1', 'key_0'}, inplace = True)
如果您查看我的评论,我建议detailDf
您将其用作最终 df,因为确实不需要合并,并且该信息已经在 Detail JSON 中。
推荐阅读
- sql - SQL 计算 Redshift 中的总持续时间
- graphviz - 修复graphviz中簇的位置
- c - AFL-fuzz 没有发现任何崩溃
- r - 计算一个值出现的次数并将结果添加到列中
- javascript - Material UI Autocomplete Multiple - 从选项列表中删除已选择的标签
- android - Firebase 实时数据库 android 缓存增长巨大
- django - 带有 docker-compose 和 Django 的 AWS Postgresql RDS 实例“不存在”
- javascript - 过滤多个选择框
- reactjs - 如何在 Ionic React 中的页面更改后更改上下文的值而不重置它?
- jquery - jQuery 在 .ajax 成功时继续下一个循环(.each)