首页 > 解决方案 > 将多个 json 对象转换为单个数据帧/csv

问题描述

我是 python 新手。

我想知道如何为多个 url 运行下面代码的相同过程。

# 代码 '1,运行良好

url ='https://toyama.com.br/wp-json/wp/v2/assistencia?local=914&ramo=&_embed&per_page=100'
header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
df = pd.read_json(url)
resp = requests.get(url, headers=header)
pandas_data_frame1 = df['acf'].apply(pd.Series)
pandas_data_frame1.to_csv ('teste2.CSV', encoding ='utf-8-sig')

# Code2,它工作不完美(多个 url,重要的是要注意一些 url 存在而另一些不存在,我需要处理这个结构)

url1 =['https://toyama.com.br/wp-json/wp/v2/assistencia?local=914&ramo=&_embed&per_page=100',
'https://toyama.com.br/wp-json/wp/v2/assistencia?local=800&ramo=&_embed&per_page=100',
'https://toyama.com.br/wp-json/wp/v2/assistencia?local=933&ramo=&_embed&per_page=100',
'https://toyama.com.br/wp-json/wp/v2/assistencia?local=844&ramo=&_embed&per_page=100',
'https://toyama.com.br/wp-json/wp/v2/assistencia?local=806&ramo=&_embed&per_page=100',
'https://toyama.com.br/wp-json/wp/v2/assistencia?local=1207&ramo=&_embed&per_page=100']

header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

for links in url1:
    df = pd.read_json(links)
    resp1 = requests.get(links, headers=header)
    data = json.loads(resp1.text)
    for d in data:
        pandas_data_frame1 = df['acf'].apply(pd.Series)
        pandas_data_frame1.to_csv ('teste2.CSV', encoding ='utf-8-sig') 

#unfortunately 只保存链接的内容 'https://toyama.com.br/wp-json/wp/v2/assistencia?local=1207&ramo=&_embed&per_page=100'

我需要的是有一个 csv,其中我将 json 键作为一列,就像代码 1 一样。

亲切的问候!

标签: jsondataframecsv

解决方案


您的代码在大部分过程中运行良好。您正在将数据加载到您的工作区并使用您需要的信息创建一个数据框。

不过,每次阅读新链接信息时,您都在替换 csv 文件。这就是您的代码仅保存最后一个链接信息的原因。

我相信有很多方法可以解决它。一种简单的策略是插入一个计数器来告诉代码何时像您一样处理信息以及何时将数据帧连接到单个数据帧中。

编码:

# Links for scapping web data
url1 =['https://toyama.com.br/wp-json/wp/v2/assistencia?local=914&ramo=&_embed&per_page=100',
'https://toyama.com.br/wp-json/wp/v2/assistencia?local=800&ramo=&_embed&per_page=100',
'https://toyama.com.br/wp-json/wp/v2/assistencia?local=933&ramo=&_embed&per_page=100',
'https://toyama.com.br/wp-json/wp/v2/assistencia?local=844&ramo=&_embed&per_page=100',
'https://toyama.com.br/wp-json/wp/v2/assistencia?local=806&ramo=&_embed&per_page=100',
'https://toyama.com.br/wp-json/wp/v2/assistencia?local=1207&ramo=&_embed&per_page=100']


header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

# Creating a counter to tell the code when join dataframes.
# For the first case we just create the dataframe. Others case we will join them into a single dataframe.
cont = 0

# Scrapping the Data
for links in url1:
    cont += 1
    
    # Printing which url link the code is reading
    print('loop:' + str(cont))
    
    df = pd.read_json(links)
    resp1 = requests.get(links, headers=header)
    data = json.loads(resp1.text)
    
    
    # First dataframe processing.
    if cont == 1:
        for d in data:
            complete_df = df['acf'].apply(pd.Series)
    
    # Others dataframe processing
    else:
         for d in data:
            others_df = df['acf'].apply(pd.Series)
            complete_df = pd.concat([complete_df, others_df])

# Removing duplciates from the dataframe. I am not sure why but apparently the code is reading few json files.
complete_df = pandas_data_frame1.drop_duplicates()

# Saving CSV file.
complete_df.to_csv ('teste2.CSV', encoding ='utf-8-sig')

我希望这可以帮助你。


推荐阅读