python - 不能让脚本解析出现在某个文本之后的其余结果
问题描述
我正在尝试在 python 中创建一个脚本,以在满足特定条件时从网页中抓取不同帖子的标题和链接。我希望脚本打印特定文本之后可用的其余结果,如Alternative to Chromedriver
本例所示。但是,我当前的尝试(错误)仅打印此文本Alternative to Chromedriver
。
import requests
from bs4 import BeautifulSoup
URL = "https://stackoverflow.com/questions/tagged/web-scraping?tab=Newest"
check_title = "Alternative to Chromedriver"
res = requests.get(URL)
soup = BeautifulSoup(res.text,'html.parser')
for item in soup.select(".summary .question-hyperlink"):
if check_title!=item.get_text(strip=True):continue
title = item.get_text(strip=True)
link = item.get("href")
print(title,link)
如何让脚本解析出现在某个文本之后的其余结果?
解决方案
尝试:
import requests
from bs4 import BeautifulSoup
URL = "https://stackoverflow.com/questions/tagged/web-scraping?tab=Newest"
check_title = "Alternative to Chromedriver"
res = requests.get(URL)
soup = BeautifulSoup(res.text,'html.parser')
# Initialise a flag to track where to start printing from
start_printing = False
for item in soup.select(".summary .question-hyperlink"):
title = item.get_text(strip=True)
# Keep iterating until the required text is found. Initialise it only once
if not start_printing and check_title == title:
start_printing = True
continue
if start_printing:
link = item.get("href")
print(title,link)