首页 > 解决方案 > 使用 python、json.loads 和 json_normalize 函数将 json 转换为 csv

问题描述

我正在尝试使用 Python 将 JSON 文件转换为 CSV 格式。我正在使用 JSON.loads() 函数,然后使用 json_normalize() 来展平对象。我想知道是否有更好的方法来做到这一点。

这是输入文件,一行表格:

{"ID": "02","Date": "2019-08-01","Total": 400,"QTY": 12,"Item": [{"NM": "0000000001","CD": "item_CD1","SRL": "25","Disc": [{"CD": "discount_CD1","Amount": 2}],"TxLns": {"TX": [{"TXNM": "000001-001","TXCD": "TX_CD1"}]}},{"NM": "0000000002","CD": "item_CD2","SRL": "26","Disc": [{"CD": "discount_CD2","Amount": 4}],"TxLns": {"TX": [{"TXNM": "000002-001","TXCD": "TX_CD2"}]}},{"NM": "0000000003","CD": "item_CD3","SRL": "27"}],"Cust": {"CustID": 10,"Email": "01@abc.com"},"Address": [{"FirstName": "firstname","LastName": "lastname","Address": "address"}]}

代码

import json
import pandas as pd
from pandas.io.json import json_normalize
data_final=pd.DataFrame()
with open("sample.json") as f:
    for line in f:
        json_obj = json.loads(line)
        ID = json_obj['ID']
        Item = json_obj['Item']
        dataMain = json_normalize(json_obj)
        dataMain=dataMain.drop(['Item','Address'], axis=1)
        #dataMain.to_csv("main.csv",index=False)
        dataItem = json_normalize(json_obj,'Item',['ID'])
        dataItem=dataItem.drop(['Disc','TxLns.TX'],axis=1)
        #dataItem.to_csv("Item.csv",index=False)
        dataDisc = pd.DataFrame()
        dataTx = pd.DataFrame()
        for rt in Item:
            NM=rt['NM']
            rt['ID'] = ID
            if 'Disc' in rt:
                data = json_normalize(rt, 'Disc', ['NM','ID'])
                dataDisc = dataDisc.append(data, sort=False)
            if 'TxLns' in rt:
                tx=rt['TxLns']
                tx['NM'] = NM
                tx['ID'] = ID
                if 'TX' in tx:
                    data = json_normalize(tx, 'TX', ['NM','ID'])
                    dataTx = dataTx.append(data, sort=False)
        dataDIS = pd.merge(dataItem, dataDisc, on=['NM','ID'],how='left')
        dataTX = pd.merge(dataDIS, dataTx, on=['NM','ID'],how='left')
        dataAddress = json_normalize(json_obj,'Address',['ID'])
        data_IT = pd.merge(dataMain, dataTX, on=['ID'])
        data_merge=pd.merge(data_IT,dataAddress, on=['ID'])
        data_final=data_final.append(data_merge,sort=False)
data_final=data_final.drop_duplicates(keep = 'first')
data_final.to_csv("data_merged.csv",index=False)

这是输出:

ID,Date,Total,QTY,Cust.CustID,Cust.Email,NM,CD_x,SRL,CD_y,Amount,TXNM,TXCD,FirstName,LastName,Address
02,2019-08-01,400,12,10,01@abc.com,0000000001,item_CD1,25,discount_CD1,2.0,000001-001,TX_CD1,firstname,lastname,address
02,2019-08-01,400,12,10,01@abc.com,0000000002,item_CD2,26,discount_CD2,4.0,000002-001,TX_CD2,firstname,lastname,address
02,2019-08-01,400,12,10,01@abc.com,0000000003,item_CD3,27,,,,,firstname,lastname,address

该代码目前工作正常。更好我的意思是:

它在时间和空间复杂性方面是否有效?如果这段代码必须在一个文件中处理大约 10K 条记录,这是优化的解决方案吗?

标签: jsonpython-3.xcsv

解决方案


推荐阅读