首页 > 解决方案 > 我需要基于 json 格式的电子邮件类型(个人和专业)的电子邮件列的多值字段(数组)

问题描述

我需要基于 json 格式的电子邮件类型(个人和专业)的电子邮件列的多值字段(数组)。目前我为每个电子邮件地址获得不同的记录,但我想通过电子邮件类型进行分组以获取特定电子邮件类型的多值电子邮件

输入文件

"source_id"|"first_name"|"last_name"|"address_type"|"address_line_1"|"city"|"email_type"|"email"
"41614335"|Reinaldo|Tonkoski Jr.|Primary|Deh 211 Box 2222|Brookings|"personal"|nag@gmail.com
"41614335"|Reinaldo|Tonkoski Jr.|home|"2409 10th St Apt 123"|Brookings|"professional"|cook@gmail.com
"07605348"|E|Christodoulou|Primary|"4D Ag Lavras st"|Kifissia|"personal"|root@gmail.com
"07605348"|E|Christodoulou|home|"131 N Hamilton Dr Apt 308"|Beverly Hills|"professional"|willy@gmail.com

输出获取

[
   {
      "source_id":7605348,
      "address":[
         {
            "address_line_1":"4D Ag Lavras st",
            "city":"Kifissia"
         },
         {
            "address_line_1":"131 N Hamilton Dr Apt 308",
            "city":"Beverly Hills"
         }
      ],
      "email":[
         {
            "email_type":"personal",
            "email":"tonkoski@ieee.org"
         },
         {
            "email":"tonkoski@ieee.org",
            "email_type":"professional"
         }
      ],
      "first_name":"E",
      "last_name":"Christodoulou"
   },
   {
      "source_id":41614335,
      "address":[
         {
            "address_line_1":"Deh 211 Box 2222",
            "city":"Brookings"
         },
         {
            "address_line_1":"2409 10th St Apt 123",
            "city":"Brookings"
         }
      ],
      "email":[
         {
            "email_type":"personal",
            "email":"tonkoski@ieee.org"
         },
         {
            "email":"tonkoski@ieee.org",
            "email_type":"professional"
         }
      ],
      "first_name":"Reinaldo",
      "last_name":"Tonkoski Jr."
   }
]

预期产出

[
   {
      "source_id":7605348,
      "address":[
         {
            "address_line_1":"4D Ag Lavras st",
            "city":"Kifissia"
         },
         {
            "address_line_1":"131 N Hamilton Dr Apt 308",
            "city":"Beverly Hills"
         }
      ],
      "email":[
         {
            "email_type":"personal",
            "email":["tonkoski@ieee.org"]
         },
         {
            "email":["tonkoski@ieee.org"]
            "email_type":"professional"
         }
      ],
      "first_name":"E",
      "last_name":"Christodoulou"
   },
   {
      "source_id":41614335,
      "address":[
         {
            "address_line_1":"Deh 211 Box 2222",
            "city":"Brookings"
         },
         {
            "address_line_1":"2409 10th St Apt 123",
            "city":"Brookings"
         }
      ],
      "email":[
         {
            "email_type":"personal",
            "email":["tonkoski@ieee.org"]
         },
         {
            "email":["tonkoski@ieee.org"],
            "email_type":"professional"
         }
      ],
      "first_name":"Reinaldo",
      "last_name":"Tonkoski Jr."
   }
]

代码尝试

dic_address = []
for source, group in df.groupby(by=["source_id"]):
    dic = {}
    dic["source_id"] = source
    dic["address"] = group.drop(
        columns=["source_id", "first_name", "last_name","email_type","email"]).to_dict("record")
    dic["email"] = group.drop(
        columns=["source_id", "first_name","last_name","address_line_1","city"]).to_dict("record")
    dic["phone"] = group.drop(
        columns=["source_id", "first_name", "last_name", "address_line_1","city","email_type", "email"]).to_dict("record")
    dic_address.append(dic)
    for i in range(0, len(dic_address)):
        my_finallist_email = [dict(s) for s in set(frozenset(d.items()) for d in dic_address[i]["email"])]
        dic_address[i]["email"] = my_finallist_email
        my_finallist_phone = [dict(s) for s in set(frozenset(d.items()) for d in dic_address[i]["phone"])]
        dic_address[i]["phone"] = my_finallist_phone

df_add = pd.DataFrame(dic_address)
listval = ['first_name','last_name']
df_source_group = df.drop_duplicates().groupby("source_id")[listval].agg(lambda x: ','.join(set(x))).reset_index().to_dict('record')
df22 = pd.DataFrame(df_source_group)
df_merge = pd.merge(df_add, df22)

标签: pythonjsonarraylist

解决方案


推荐阅读