python - 当条件不满足时,无法让我的脚本继续尝试几次
问题描述
我在 python 中创建了一个脚本来从网页的不同链接中获取某些帖子的标题。问题是我尝试玩的网页有时无法为我提供有效的响应,但当我尝试两次或三次时,我确实得到了有效的响应。
我一直在尝试以这种方式创建一个循环,以便脚本检查我定义的标题是否为空。如果标题什么都没有,那么脚本将继续循环 4 次以查看是否可以成功。但是,在每个链接的第四次尝试之后,脚本将去另一个链接重复相同的操作,直到所有链接都用尽。
到目前为止,这是我的尝试:
import time
import requests
from bs4 import BeautifulSoup
links = [
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
]
counter = 0
def fetch_data(link):
global counter
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
try:
title = soup.select_one("p.tcode").text
except AttributeError: title = ""
if not title:
while counter<=4:
time.sleep(1)
print("trying {} times".format(counter))
counter += 1
fetch_data(link)
else:
counter = 0
print("tried with this link:",link)
if __name__ == '__main__':
for link in links:
fetch_data(link)
这是我现在可以在控制台中看到的输出:
trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4
我的预期输出:
trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4
PS I used wrong selector within my script so that I can let it meet the condition I've defined above.
当不满足条件时,如何让我的脚本继续尝试每个链接几次
解决方案
我认为重新安排您的代码如下。
import time
import requests
from bs4 import BeautifulSoup
links = [
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
]
def fetch_data(link):
global counter
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
try:
title = soup.select_one("p.tcode").text
except AttributeError: title = ""
if not title:
while counter<=4:
time.sleep(1)
print("trying {} times".format(counter))
counter += 1
fetch_data(link)
if __name__ == '__main__':
for link in links:
counter = 0
fetch_data(link)
print("tried with this link:",link)
推荐阅读
- android - 进行 rxJava 网络调用并根据响应在 UI 线程上推进工作的最佳方法是什么
- groovy - 从 Unetstack 中的路由表中删除路由
- wcf - WCF 从管道读取错误:管道已结束。(109, 0x6d)
- python-3.x - 元类的“__init_subclass__”方法在此元类构造的类中不起作用
- mysql - 如何选择其他查询中不存在的数据
- amazon-aurora - 无法从 SQL 客户端连接 Amazon Aurora Serverless
- .net-core - .net core 2.2 从 Main 写入 NLog,但不从控制器写入
- r - 从以“-”分隔的字符串中提取数字及其符号
- android - java.lang.IndexOutOfBoundsException: Index: 30, Size: 30 in onBindViewHolder() on Android
- python-3.x - 调用 __init__ 时尝试从 pandas DataFrame 继承时出错