首页 > 解决方案 > 我将如何从网站上抓取数据并每天使用新信息更新文件,同时保存旧数据?

问题描述

我最初计划使用 CSV 文件,但它需要我每天手动登录 VScode 并运行我的脚本以将数据添加到 csv 文件,它会替换我之前输入的旧数据。

标签: python

解决方案


如果您抓取的数据集很小,[{<column1>: <data>, <column2>: <data>, ...}, ...]请使用您要保存的每一行的结构将数据抓取到嵌套的字典列表中,然后使用此函数通过执行以下操作将该字典附加到 csv 文件append_csv_dict(<path_to_your_csv>, <your_dictionary>)

import csv

def append_csv_dict(path, data):
    '''
    Append a csv with a dictionary keys as column headers
    Args:
        path (str): Path to the csv file
        data (dict or list): Dictionary or list(dict) with keys as 
                             column  headers and values as column data
    '''
    with open(path, 'a') as file:
        # set the field names to the keys of the dictionary or keys of the first item
        fieldnames = list(data.keys()) if isinstance(data, dict) else data[0].keys()
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        # write the header if the file is new
        if file.tell() == 0:
            writer.writeheader()
        if isinstance(data, dict):
            fieldnames = list(data.keys())
            # write the row
            writer.writerow(data)
        elif isinstance(data, list):
            # write the rows if it is a list
            writer.writerows(data)

# some example data, you can do one dictionary at a time if you only do one row per day
scraped_data = [
    {
        'first_name': 'John',
        'last_name': 'Do',
        'age': 31
    },
    {
        'first_name': 'Jane',
        'last_name': 'Do',
        'age': 33
    },
    {
        'first_name': 'Foo',
        'last_name': 'Bar',
        'age': 58
    }
]

append_csv_dict('./scrape.csv', scraped_data)

输出(scrape.csv):

first_name,last_name,age
John,Do,31
Jane,Do,33
Foo,Bar,58

推荐阅读