首页 > 解决方案 > 不能让脚本解析出现在某个文本之后的其余结果

问题描述

我正在尝试在 python 中创建一个脚本,以在满足特定条件时从网页中抓取不同帖子的标题和链接。我希望脚本打印特定文本之后可用的其余结果,如Alternative to Chromedriver本例所示。但是,我当前的尝试(错误)仅打印此文本Alternative to Chromedriver

import requests
from bs4 import BeautifulSoup

URL = "https://stackoverflow.com/questions/tagged/web-scraping?tab=Newest"
check_title = "Alternative to Chromedriver"

res = requests.get(URL)
soup = BeautifulSoup(res.text,'html.parser')

for item in soup.select(".summary .question-hyperlink"):
    if check_title!=item.get_text(strip=True):continue
    title = item.get_text(strip=True)
    link = item.get("href")
    print(title,link)

如何让脚本解析出现在某个文本之后的其余结果?

标签: pythonpython-3.xweb-scraping

解决方案


尝试:

import requests
from bs4 import BeautifulSoup

URL = "https://stackoverflow.com/questions/tagged/web-scraping?tab=Newest"
check_title = "Alternative to Chromedriver"

res = requests.get(URL)
soup = BeautifulSoup(res.text,'html.parser')

# Initialise a flag to track where to start printing from 
start_printing = False

for item in soup.select(".summary .question-hyperlink"):
    title = item.get_text(strip=True)

    # Keep iterating until the required text is found. Initialise it only once
    if not start_printing and check_title == title:
        start_printing = True
        continue
    if start_printing:
        link = item.get("href")
        print(title,link)

推荐阅读