pandas - 如何将每个新数据框添加到创建的 csv 中?
问题描述
我的问题是只保存最近的 url 请求。如何保存所有回复?我尝试使用df.to_csv('complete.csv', 'a')
,但这会创建一个混乱的文件。
# imports
import requests
from bs4 import BeautifulSoup
import pandas as pd
# main code
with open('list.txt', 'r') as f_in:
for line in map(str.strip, f_in):
if not line:
continue
response = requests.get(line)
data = response.text
soup = BeautifulSoup(data, 'html.parser')
linecodes = []
partnos = []
for tbody in soup.select('tbody[id^="listingcontainer"]'):
tmp = tbody.find('span', class_='listing-final-manufacturer')
linecodes.append(tmp.text if tmp else '-')
tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')
partnos.append(tmp.text if tmp else '-')
# create dataframe
df = pd.DataFrame(zip(linecodes,partnos), columns=['linecode', 'partno'])
# save to csv
df.to_csv('complete.csv')
print(df)
列表.txt
https://www.rockauto.com/en/catalog/ford,2010,f-150,6.2l+v8,1447337,brake+&+wheel+hub,brake+pad,1684
https://www.rockauto.com/en/catalog/ford,2015,f-150,5.0l+v8,3308775,brake+&+wheel+hub,brake+pad,1684
解决方案
您在每次迭代后保存数据帧,这只是覆盖以前的保存。因此,您需要在每次迭代后附加数据帧。完成循环后,保存最终的数据帧。所以像:
# imports
import requests
from bs4 import BeautifulSoup
import pandas as pd
# main code
with open('list.txt', 'r') as f_in:
final_df = pd.DataFrame()
for line in map(str.strip, f_in):
if not line:
continue
response = requests.get(line)
data = response.text
soup = BeautifulSoup(data, 'html.parser')
linecodes = []
partnos = []
for tbody in soup.select('tbody[id^="listingcontainer"]'):
tmp = tbody.find('span', class_='listing-final-manufacturer')
linecodes.append(tmp.text if tmp else '-')
tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')
partnos.append(tmp.text if tmp else '-')
# create dataframe
df = pd.DataFrame(zip(linecodes,partnos), columns=['linecode', 'partno'])
print(df)
final_df = final_df.append(df, sort=False).reset_index(drop=True)
# save to csv
final_df.to_csv('complete.csv')
print(final_df)
推荐阅读
- image - 根据列查找矩阵中的元素
- excel - 如果值匹配,如何将值从工作表复制到另一个工作表?
- javascript - 将列表值调用到 SharePoint 页面上的表中
- typescript - 类型的并集和交集
- python - 我怎么能在 django 项目中创建一个 json 类型的文件?
- javascript - 编译打字稿时删除不可收拾的属性
- java - 有没有办法将 Jbuttons 放在 JLabel 上(带有背景图片)?
- python - Bin 在 altair 上几个月了?
- oauth-2.0 - 使用身份验证令牌授予对特定项目的访问权限
- typescript - 打字稿不理解“类型之一”中的对象键