python - 在各种 url 上迭代相同的刮板代码
问题描述
现在我需要在多个子域上重复相同的代码。这是我当前的代码:
我编辑了我的代码以更好地反映我的问题:
for base in urls:
urls = ["https://www.pedidosya.com.ar/restaurantes/buenos-aires/recoleta/empanadas-delivery","https://www.pedidosya.com.ar/restaurantes/buenos-aires/almagro/empanadas-delivery","https://www.pedidosya.com.ar/restaurantes/buenos-aires/palermo/empanadas-delivery","https://www.pedidosya.com.ar/restaurantes/buenos-aires/villa-crespo/empanadas-delivery","https://www.pedidosya.com.ar/restaurantes/buenos-aires/balvanera/empanadas-delivery",]
page = 1
restaurants = []
while True:
soup = bs(requests.get(base + str(page)).text, "html.parser")
page += 1
sections = soup.find_all("section", attrs={"class": "restaurantData"})
if not sections: break
for section in sections:
for elem in section.find_all("a", href=True, attrs={"class": "arrivalName"}):
restaurants.append({"name": elem.text, "url": elem["href"],})
我需要一个包含以下列的 .CSV:
[(url, name of all restaurants in each url, url for each restaurant)]
解决方案
我认为这就是你要找的:
from bs4 import BeautifulSoup as bs
from urllib.request import urlopen as uReq
import bs4
import requests
import csv
urls = ["https://www.pedidosya.com.ar/restaurantes/buenos-aires/recoleta/empanadas-delivery","https://www.pedidosya.com.ar/restaurantes/buenos-aires/almagro/empanadas-delivery","https://www.pedidosya.com.ar/restaurantes/buenos-aires/palermo/empanadas-delivery","https://www.pedidosya.com.ar/restaurantes/buenos-aires/villa-crespo/empanadas-delivery","https://www.pedidosya.com.ar/restaurantes/buenos-aires/balvanera/empanadas-delivery",]
#writing
with open("output.csv", 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(['subdomain', 'name', 'url']) #delete this line if you don't want the header
for url in urls:
base = url+ "?bt=RESTAURANT&page="
page = 1
restaurants = []
while True:
soup = bs(requests.get(base + str(page)).text, "html.parser")
sections = soup.find_all("section", attrs={"class": "restaurantData"})
if not sections: break
for section in sections:
for elem in section.find_all("a", href=True, attrs={"class": "arrivalName"}):
restaurants.append({"name": elem.text, "url": elem["href"],})
writer.writerow([base+str(page),elem.text,elem["href"]])
page += 1
#reading
file = open("output.csv", 'r')
reader = csv.reader(file)
for row in reader:
#the output is a bunch of lists, which you can do what you want with
print(row)
这是输出:
subdomain,name,url
https://www.pedidosya.com.ar/restaurantes/buenos-aires/recoleta/empanadas-delivery?bt=RESTAURANT&page=1,Cümen-Cümen Empanadas Palermo,https://www.pedidosya.com.ar/restaurantes/buenos-aires/cumen-cumen-empanadas-palermo-menu
https://www.pedidosya.com.ar/restaurantes/buenos-aires/recoleta/empanadas-delivery?bt=RESTAURANT&page=1,El Maitén Empanadas - Al horno o fritas,https://www.pedidosya.com.ar/restaurantes/buenos-aires/el-maiten-empanadas-al-horno-o-fritas-menu
https://www.pedidosya.com.ar/restaurantes/buenos-aires/recoleta/empanadas-delivery?bt=RESTAURANT&page=1,Cümen-Cümen Empanadas - Barrio Norte,https://www.pedidosya.com.ar/restaurantes/buenos-aires/cumen-cumen-empanadas-barrio-norte-menu
https://www.pedidosya.com.ar/restaurantes/buenos-aires/recoleta/empanadas-delivery?bt=RESTAURANT&page=1,La Carbonera,https://www.pedidosya.com.ar/restaurantes/buenos-aires/la-carbonera-menu
https://www.pedidosya.com.ar/restaurantes/buenos-aires/recoleta/empanadas-delivery?bt=RESTAURANT&page=1,Tatú Empanadas Salteñas Palermo,https://www.pedidosya.com.ar/restaurantes/buenos-aires/tatu-empanadas-saltenas-palermo-menu
https://www.pedidosya.com.ar/restaurantes/buenos-aires/recoleta/empanadas-delivery?bt=RESTAURANT&page=1,Morita Palermo,https://www.pedidosya.com.ar/restaurantes/buenos-aires/morita-palermo-menu
https://www.pedidosya.com.ar/restaurantes/buenos-aires/recoleta/empanadas-delivery?bt=RESTAURANT&page=1,Doña Eulogia,https://www.pedidosya.com.ar/restaurantes/buenos-aires/dona-eulogia-menu
...
...
...
使用 python 读取 csv 时的输出:
['subdomain', 'name', 'url']
['https://www.pedidosya.com.ar/restaurantes/buenos-aires/recoleta/empanadas-delivery?bt=RESTAURANT&page=1', 'Cümen-Cümen Empanadas Palermo', 'https://www.pedidosya.com.ar/restaurantes/buenos-aires/cumen-cumen-empanadas-palermo-menu']
['https://www.pedidosya.com.ar/restaurantes/buenos-aires/recoleta/empanadas-delivery?bt=RESTAURANT&page=1', 'El Maitén Empanadas - Al horno o fritas', 'https://www.pedidosya.com.ar/restaurantes/buenos-aires/el-maiten-empanadas-al-horno-o-fritas-menu']
...
...
...
因此,当您阅读 csv 时,您会得到(上图),这是一堆您可以迭代的列表。
祝你好运!
推荐阅读
- http - Why is my port forwarding not working?
- android - Not able to access localhost on android device on the same network
- drupal-8 - Day and month name translation in Drupal8
- sql - 如何将多个sql行连接成一行
- ios - 如何构建一个按钮来单独控制计时器并在标签中显示倒计时?
- javascript - 你如何在反应路线内路由
- python - 无法获取英文搜索结果
- ms-access - 在 Access VBA 中,有没有办法计算和引用当前在主窗口中打开的对象?
- flowtype - 使用 flowtype 键入函数对象
- pharo - 如何在 Pharo 6.1 中连接 SQL Server 数据库?