首页 > 解决方案 > 如何通过抓取将新数据保存到现有的 csv 中

问题描述

我在 jupyter notebook 和任务调度程序中使用 python 实时抓取数据,每月抓取一次,一切正常,数据保存到 csv 和 sql 服务器中,但问题是每次数据更新时,csv 文件不会改变到新的。这是我的代码

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.indexmundi.com/commodities/?commodity=potassium-chloride&months=300"
r = requests.get(url)
html = r.text
soup = BeautifulSoup(html)
table = soup.find('table', {"class": "tblData"})
rows = table.find_all('tr')
data = []
for row in rows[1:]:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])
result = pd.DataFrame(data, columns=['month', 'price', 'change'])
result['month'] = pd.to_datetime(result["month"])
result.to_csv("kcl.csv", index=False, mode='w')

df = pd.read_csv("kcl.csv")
pd.set_option('display.max_rows', df.shape[0]+1)
print(df)
import pyodbc
from sqlalchemy import create_engine

server = 'MSHULHAN\SQLEXPRESS'

database = 'daming'

engine = create_engine('mssql+pyodbc://' + server + '/' + database + '?trusted_connection=yes&driver=ODBC+Driver+13+for+SQL+Server')

#engine = create_engine('mysql://root:@localhost/daming') # enter your password and database names here

col_names = ["month", "price", "change"]
df = pd.read_csv("kcl.csv",sep=',',quotechar='\'',encoding='utf8', names=col_names,skiprows = 1) # Replace Excel_file_name with your excel sheet name
df.to_sql('kcl',con=engine,index=False,if_exists='replace') # Replace Table_name with your sql table name

使用 sql server 我可以做 if exist='replace',但使用 csv 它什么也没做。请帮助我,提前谢谢你!

标签: pythonpandascsvscreen-scraping

解决方案


推荐阅读