首页 > 解决方案 > 将 Json 转换为 CSV 文件,但分隔符因属性而异

问题描述

将 JSON 文件转换为 CSV,但所有属性值用逗号 (,) 分隔。

phone 是多值属性,因此 2 个或多个 phone 应由管道 (|) 分隔地址是复杂的多值属性,因此地址中的每个值应由分号 (;) 分隔。

当我将 json 转换为 csv 时,我只有分隔符逗号,但无法分隔多值和复杂的多值属性。

代码尝试

df = pd.read_json("file")
df1 = df.to_csv("file", sep=",",index=False)

json中的输入文件

[
   {
      "parsed_address":[
         {
            "address_type":"primary",
            "address_line_1":"abc",
            "city":"jersey",
            "state":"nj",
            "postal_code":"073024588",
            "country":"usa"
         },
         {
            "address_type":"work",
            "address_line_1":"xyz",
            "city":"ny",
            "state":"ns",
            "postal_code":"073024533",
            "country":"london"
         }
      ],
      "phone":[
         {
            "phone":"+12177218280",
            "phone_type":"Mobile"
         },
         {
            "phone":"+1217721340",
            "phone_type":"Work"
         }
      ],
      "first_name":"saman",
      "last_name":"zonouz"
   },
]

CSV 格式的输出文件

first_name,last_name,phone,parsed_address
samon,zonouz,+12177218280|+1217721340,abc;jersey;nj;073024588;usa|xyz;ny;ns;073024533;london

标签: pythonjsonpandascsv

解决方案


我认为最简单的方法是使用正确的键构建一个新的字典列表:

import pandas as pd

addresses = [
  {
      "parsed_address":[
         {
            "address_type":"primary",
            "address_line_1":"abc",
            "city":"jersey",
            "state":"nj",
            "postal_code":"073024588",
            "country":"usa"
         },
         {
            "address_type":"work",
            "address_line_1":"xyz",
            "city":"ny",
            "state":"ns",
            "postal_code":"073024533",
            "country":"london"
         }
      ],
      "phone":[
         {
            "phone":"+12177218280",
            "phone_type":"Mobile"
         },
         {
            "phone":"+1217721340",
            "phone_type":"Work"
         }
      ],
      "first_name":"saman",
      "last_name":"zonouz"
   }
]

formatted_addr = []
for addr in addresses:
    new_dic={}
    new_dic['first_name'] = addr['first_name']
    new_dic['last_name'] = addr['last_name']
    new_dic['phone'] = '|'.join([dic_phone['phone'] for dic_phone in addr['phone']])
    new_dic['parsed_address'] = '|'.join(
                                    [';'.join([dic_addr[key] 
                                    for key in dic_addr.keys() if key != 'address_type'])
                                    for dic_addr in addr['parsed_address']])
    formatted_addr.append(new_dic)

df = pd.DataFrame(formatted_addr)
df1 = df.to_csv('example.csv', sep=",",index=False)

输出:

first_name,last_name,phone,parsed_address
saman,zonouz,+12177218280|+1217721340,abc;jersey;nj;073024588;usa|xyz;ny;ns;073024533;london

推荐阅读