首页 > 解决方案 > 如何创建一个持续检测列表中抓取的数据是否更改的while循环

问题描述

import time
from bs4 import BeautifulSoup
import requests
from urllib.request import Request, urlopen

pages = ["movies", "series"]
printed = []
for page in pages:
    req = Request("https://www.thenetnaija.com/videos/" + page, headers={'User-Agent': 'XYZ/3.0'})

    webpage = urlopen(req, timeout=10)

    b4 = BeautifulSoup(webpage, "html.parser")

    movie_list = b4.find_all("div", {"class" : "video-files"})


    for allContainers in movie_list:
        filmName = allContainers.find('img').get('alt')
        printed.append(filmName)
        print(printed)
for get in printed:
requests.get("https://api.telegram.org/bot:AAEapVykIXdphGYaH5ZjXuhpFaFw7wpi5Bs/sendMessage?chat_id=&text={}".format(get))

我想使用 while 循环让程序无限运行,并且仅在列表中的数据发生更改时才将请求发送到我的电报聊天。

标签: python-3.xweb-scrapingbeautifulsoupwhile-loop

解决方案


您可以将此示例用作如何定期检查电影/连续剧的基础(该示例set.difference用于确定是否有更改):

import time
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen


def get_movies(url):
    headers = {"User-Agent": "XYZ/3.0"}
    req = Request(url, headers=headers)
    b4 = BeautifulSoup(urlopen(req, timeout=10), "html.parser")
    return set(a.get_text(strip=True) for a in b4.select("h2 a"))


url = "https://www.thenetnaija.com/videos/{}"
pages = {
    "movies": get_movies(url.format("movies")),
    "series": get_movies(url.format("series")),
}

while True:
    time.sleep(10)  # <-- sleep 10sec before checking again

    for k, v in pages.items():
        new_movies = get_movies(url.format(k))
        difference = new_movies.difference(v)

        if difference:
            print("New {}:".format(k))
            print(difference)
            pages[k] = new_movies

            # do stuff here (post to telegram etc.)
            # ...
        else:
            print("No new {}".format(k))

推荐阅读