首页 > 解决方案 > 如何将每个新数据框添加到创建的 csv 中?

问题描述

我的问题是只保存最近的 url 请求。如何保存所有回复?我尝试使用df.to_csv('complete.csv', 'a'),但这会创建一个混乱的文件。

# imports
import requests
from bs4 import BeautifulSoup
import pandas as pd

# main code
with open('list.txt', 'r') as f_in:
    for line in map(str.strip, f_in):
        if not line:
            continue

        response = requests.get(line)
        data = response.text
        soup = BeautifulSoup(data, 'html.parser')

        linecodes = []
        partnos = []

        for tbody in soup.select('tbody[id^="listingcontainer"]'):
            tmp = tbody.find('span', class_='listing-final-manufacturer')
            linecodes.append(tmp.text if tmp else '-')

            tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')
            partnos.append(tmp.text if tmp else '-')

        # create dataframe
        df = pd.DataFrame(zip(linecodes,partnos), columns=['linecode', 'partno'])

        # save to csv
        df.to_csv('complete.csv')

        print(df)

列表.txt

https://www.rockauto.com/en/catalog/ford,2010,f-150,6.2l+v8,1447337,brake+&+wheel+hub,brake+pad,1684
https://www.rockauto.com/en/catalog/ford,2015,f-150,5.0l+v8,3308775,brake+&+wheel+hub,brake+pad,1684

标签: pandasdataframecsvbeautifulsouppython-requests

解决方案


您在每次迭代后保存数据帧,这只是覆盖以前的保存。因此,您需要在每次迭代后附加数据帧。完成循环后,保存最终的数据帧。所以像:

# imports
import requests
from bs4 import BeautifulSoup
import pandas as pd

# main code
with open('list.txt', 'r') as f_in:
    final_df = pd.DataFrame()
    for line in map(str.strip, f_in):
        if not line:
            continue

        response = requests.get(line)
        data = response.text
        soup = BeautifulSoup(data, 'html.parser')

        linecodes = []
        partnos = []

        for tbody in soup.select('tbody[id^="listingcontainer"]'):
            tmp = tbody.find('span', class_='listing-final-manufacturer')
            linecodes.append(tmp.text if tmp else '-')

            tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')
            partnos.append(tmp.text if tmp else '-')

        # create dataframe
        df = pd.DataFrame(zip(linecodes,partnos), columns=['linecode', 'partno'])
        print(df)
        final_df = final_df.append(df, sort=False).reset_index(drop=True)

    # save to csv
    final_df.to_csv('complete.csv')

    print(final_df)

推荐阅读