首页 > 解决方案 > 每次通过每页循环时如何保存到新的csv文件

问题描述

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd
from pymongo import MongoClient
import tkinter as tk
from tkinter import filedialog
from pandas import DataFrame
from datetime import datetime

# launch WMS
url = "https://inserturlhere.com/solution/login.htm"
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)

#Login start
username = driver.find_element_by_id("username")
username.clear()
username.send_keys("xxx")

password = driver.find_element_by_id("password")
password.clear()
password.send_keys("xxx")

driver.find_element_by_id("loginButton").click()
#login end


# set variables
pageSize = 200
pageNo = 0
currentView = 33362
entityName = "Item"
now = datetime.now()

def get_filename_datetime():
    return str(entityName) + "-" + str(now.strftime("%d%m%Y-%H%M%S")) +".csv"

for pageNo in range(10):
    url2 = "https://inserturlhere.com/solution/entitylist.htm?entityName=Item&tabName=Item&pageNo={pageNo}&pageSize={pageSize}&currentViewId={currentView}".format(pageNo=pageNo, pageSize=pageSize, currentView=currentView)
    # open shipments page
    driver.get(url2)
    html = driver.page_source
    dfs = pd.read_html(html, attrs={"class":"roundedTable"}, header=5)
    for df in dfs:
        df.dropna(how="all", axis="columns", inplace=True)
        df.drop({"No", "Process Action"}, axis="columns", inplace=True)
        df.dropna(how='all', axis=0, inplace=True)
        df.append(dfs)
        df.to_csv(get_filename_datetime(), index=False)

大家好。我如何修改上面的代码以在每次循环每页时保存一个新的 csv?现在,它只保存一个包含最后一页的文件。

让我知道是否需要其他数据!谢谢!

标签: pythonpandasseleniumweb-scrapingbeautifulsoup

解决方案


您可以使用pageNo作为唯一编号添加到文件名:

df.to_csv(f'{get_filename_datetime()}_{pageNo}', index=False)

推荐阅读