python - 如何阻止 excel 表被覆盖,我希望它在一张表中
问题描述
当数据被写入(for循环)在一个单一的excel表中时,它会覆盖excel表,而且为了阻止它被覆盖,我需要将收集到的较新数据分离到表中。(熊猫)
那么我该怎么做呢?
下面的代码:
ih = input('pages: ')
def test():
for page in range(1, int(ih)):
req = requests.get(url + str(page))
soup = BeautifulSoup(req.content, 'html.parser')
g_data = soup1.find_all('span', {"class": "b-card b-card-mod-h vehicle"})
g_price = soup.find_all('div', {"class": "b-card--el-vehicle-price"})
g_mile = soup.find_all('p', {"class": "b-card--el-brief-details"})
g_name = soup.find_all('p', {"class": "b-card--el-description"})
g_user = soup.find_all('a', {"class": "b-card--el-agency-title"})
g_link = soup.find_all('div', {"class": "b-card--el-inner-wrapper"})
m_price = [item.text for item in g_price]
m_mile = [item.text for item in g_mile]
m_user = [item.text for item in g_user]
m_name = [item.text for item in g_name]
m_link = [item.a["href"] for item in g_link]
m_extensions = [('') for item in g_link]
l1 = m_name
l2 = m_mile
l3 = m_price
l4 = m_user
l5 = m_link
l6 = m_extensions
s1 = pd.Series(l1, name='Vehicle Name')
s2 = pd.Series(l2, name='Mileage')
s3 = pd.Series(l3, name='Price')
s4 = pd.Series(l4, name='User')
s5 = pd.Series(l5, name='Link')
s6 = pd.Series(l6, name='Site')
df = pd.concat([s1,s2,s3,s4,s6+s5], axis=1)
if(os.path.isfile('hello_world.xlsx')):
sheet.write(df)
workbook.close()
else:
sheet.write('hello_world.xlsx', index= False)
workbook.close()
print(f'[+]Writing Data from page ' + str(page))
ctypes.windll.kernel32.SetConsoleTitleW('[+]Writing Data from page ' + str(page))
print('[=]Written Data')
# Write the data.
test()
如果有人可以帮忙,谢谢!
解决方案
您可以使用 openpyxl 获取工作表的最后一行,然后使用 dataframeto_excel
方法将数据写入特定行。请注意,您必须设置writer.sheets
以防止在保存之前清除工作簿。
将此方法添加到您的代码中:
def AppendExcel(df, filename):
import openpyxl
sheetname = "Sheet1"
if not os.path.isfile(filename): # create new file
df.to_excel(filename, startrow=0, index=False, sheet_name=sheetname)
else: # append
wb = openpyxl.load_workbook(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl')
writer.book = wb
writer.sheets = dict((ws.title, ws) for ws in wb.worksheets) # need this to prevent overwrite
lastrow = wb[sheetname].max_row
df.to_excel(writer, startrow=lastrow, index=False, header=False, sheet_name=sheetname)
writer.save()
有了这个:
AppendExcel(df, 'hello_world.xlsx')
此代码未经测试,因此您可能需要对其进行一些调整。