python - 我需要基于 json 格式的电子邮件类型(个人和专业)的电子邮件列的多值字段(数组)
问题描述
我需要基于 json 格式的电子邮件类型(个人和专业)的电子邮件列的多值字段(数组)。目前我为每个电子邮件地址获得不同的记录,但我想通过电子邮件类型进行分组以获取特定电子邮件类型的多值电子邮件
输入文件
"source_id"|"first_name"|"last_name"|"address_type"|"address_line_1"|"city"|"email_type"|"email"
"41614335"|Reinaldo|Tonkoski Jr.|Primary|Deh 211 Box 2222|Brookings|"personal"|nag@gmail.com
"41614335"|Reinaldo|Tonkoski Jr.|home|"2409 10th St Apt 123"|Brookings|"professional"|cook@gmail.com
"07605348"|E|Christodoulou|Primary|"4D Ag Lavras st"|Kifissia|"personal"|root@gmail.com
"07605348"|E|Christodoulou|home|"131 N Hamilton Dr Apt 308"|Beverly Hills|"professional"|willy@gmail.com
输出获取
[
{
"source_id":7605348,
"address":[
{
"address_line_1":"4D Ag Lavras st",
"city":"Kifissia"
},
{
"address_line_1":"131 N Hamilton Dr Apt 308",
"city":"Beverly Hills"
}
],
"email":[
{
"email_type":"personal",
"email":"tonkoski@ieee.org"
},
{
"email":"tonkoski@ieee.org",
"email_type":"professional"
}
],
"first_name":"E",
"last_name":"Christodoulou"
},
{
"source_id":41614335,
"address":[
{
"address_line_1":"Deh 211 Box 2222",
"city":"Brookings"
},
{
"address_line_1":"2409 10th St Apt 123",
"city":"Brookings"
}
],
"email":[
{
"email_type":"personal",
"email":"tonkoski@ieee.org"
},
{
"email":"tonkoski@ieee.org",
"email_type":"professional"
}
],
"first_name":"Reinaldo",
"last_name":"Tonkoski Jr."
}
]
预期产出
[
{
"source_id":7605348,
"address":[
{
"address_line_1":"4D Ag Lavras st",
"city":"Kifissia"
},
{
"address_line_1":"131 N Hamilton Dr Apt 308",
"city":"Beverly Hills"
}
],
"email":[
{
"email_type":"personal",
"email":["tonkoski@ieee.org"]
},
{
"email":["tonkoski@ieee.org"]
"email_type":"professional"
}
],
"first_name":"E",
"last_name":"Christodoulou"
},
{
"source_id":41614335,
"address":[
{
"address_line_1":"Deh 211 Box 2222",
"city":"Brookings"
},
{
"address_line_1":"2409 10th St Apt 123",
"city":"Brookings"
}
],
"email":[
{
"email_type":"personal",
"email":["tonkoski@ieee.org"]
},
{
"email":["tonkoski@ieee.org"],
"email_type":"professional"
}
],
"first_name":"Reinaldo",
"last_name":"Tonkoski Jr."
}
]
代码尝试
dic_address = []
for source, group in df.groupby(by=["source_id"]):
dic = {}
dic["source_id"] = source
dic["address"] = group.drop(
columns=["source_id", "first_name", "last_name","email_type","email"]).to_dict("record")
dic["email"] = group.drop(
columns=["source_id", "first_name","last_name","address_line_1","city"]).to_dict("record")
dic["phone"] = group.drop(
columns=["source_id", "first_name", "last_name", "address_line_1","city","email_type", "email"]).to_dict("record")
dic_address.append(dic)
for i in range(0, len(dic_address)):
my_finallist_email = [dict(s) for s in set(frozenset(d.items()) for d in dic_address[i]["email"])]
dic_address[i]["email"] = my_finallist_email
my_finallist_phone = [dict(s) for s in set(frozenset(d.items()) for d in dic_address[i]["phone"])]
dic_address[i]["phone"] = my_finallist_phone
df_add = pd.DataFrame(dic_address)
listval = ['first_name','last_name']
df_source_group = df.drop_duplicates().groupby("source_id")[listval].agg(lambda x: ','.join(set(x))).reset_index().to_dict('record')
df22 = pd.DataFrame(df_source_group)
df_merge = pd.merge(df_add, df22)
解决方案
推荐阅读
- typescript - TypeScript 编译器 API 的“模块解析缓存”应该如何使用?
- java - 如何使用 javax.sound.sampled.LineListener?
- java - 如何在 onCreateView 中使用毕加索?
- javascript - jQuery:如果声音按钮关闭,则静音所有声音
- python - 每当我使用 conda 在 VS 代码中运行 python 脚本时,“conda activate base”
- python - asfreq 使用 Period dtype 产生意想不到的结果
- javascript - 未捕获的 SyntaxError:JavaScript 中的意外标识符 - 测验
- spring-boot - OpenAPI vs 招摇
- php - 按逗号分隔的列表排序,不带引号
- c# - 使用 For 循环更改组合框值