python - 我想抓取多个页面,但我得到了最后一个url的结果。为什么?
问题描述
为什么结果会输出最后一个 url?我的代码有问题吗?
import requests as uReq
from bs4 import BeautifulSoup as soup
import numpy as np
#can i use while loop instead for?
for page in np.arange(1,15):
url = uReq.get('https://www.myanmarbusiness-directory.com/en/categories-index/car-wheels-tyres-tubes-dealers/page{}.html?city=%E1%80%99%E1%80%9B%E1%80%99%E1%80%B9%E1%80%B8%E1%80%80%E1%80%AF%E1%80%94%E1%80%B9%E1%80%B8%E1%81%BF%E1%80%99%E1%80%AD%E1%80%B3%E1%82%95%E1%80%94%E1%80%9A%E1%80%B9'.format(page)).text
#have used for loop,but result is the last url
page_soup = soup(url,"html.parser")
info = page_soup.findAll("div",{"class: ","row detail_row"})
#Do all the url return output in one file?
filename = "wheel.csv"
file = open(filename,"w",encoding="utf-8")
解决方案
您应该检查 for 循环之后发生的事情的缩进,否则,变量url
会在循环的每次迭代中被替换,因此只保留最后一个。
import requests as uReq
from bs4 import BeautifulSoup as soup
import numpy as np
for page in np.arange(1,15):
url = uReq.get('https://www.myanmarbusiness-directory.com/en/categories-index/car-wheels-tyres-tubes-dealers/page{}.html?city=%E1%80%99%E1%80%9B%E1%80%99%E1%80%B9%E1%80%B8%E1%80%80%E1%80%AF%E1%80%94%E1%80%B9%E1%80%B8%E1%81%BF%E1%80%99%E1%80%AD%E1%80%B3%E1%82%95%E1%80%94%E1%80%9A%E1%80%B9'.format(page)).text
# this should be done N times (where N is the range param)
page_soup = soup(url,"html.parser")
info = page_soup.findAll("div",{"class: ","row detail_row"})
# append the results to the csv file
filename = "wheel.csv"
file = open(filename,"a",encoding="utf-8")
... # code for writing in the csv file
file.close()
然后,您将在文件中找到所有内容。请注意,您还应该关闭文件以保存它。
推荐阅读
- flutter - DIO 多请求 IOS 设备不执行
- ubuntu - MPI_Init() 在 Raspberry Pi 4 上立即崩溃
- c# - AddCustomField 在 HelloSignAPI 中不起作用
- freeswitch - 从两个 FreeSWITCH 连接时如何解决 NO_ROUTE_DESTINATION
- firebase - 我在哪里可以找到使用 Expo CLI 创建的 react native 应用程序的包名称?
- android - PreviewView 填充缩放如何处理比视图分辨率更高的图像?
- routes - 使用 Ghost 制作多语言博客
- database - 使用颤振动态存储本地图像的最佳实践 - 配置单元、文件夹
- c - 为什么我的程序中的重新分配不能正常工作?
- apache-spark - Spark xxhash64 在执行环境中不一致