首页 > 解决方案 > 如何创建一个持续检测抓取数据是否更改的while循环?

问题描述

我正在抓取一个网站。但是,我想创建一个代码,该代码将在数据更改时不断抓取网站并打印。如果数据没有改变,那么它保持不变。基本上,这意味着我不必一直单击运行以查看数据是否已更改。

我尝试做一个while循环,但不知道如何包含我在网上收到的数据。

import urllib
import urllib.request

from bs4 import BeautifulSoup

theurl = 'xyz'
thepage = urllib.request.urlopen(theurl)

soup = BeautifulSoup(thepage, 'html.parser')

data = soup.find('div' , ( 'class' , 'sticky')). text

print(data)  

标签: pythonpython-3.xwhile-loopbeautifulsoup

解决方案


该脚本可以帮助您入门。每 1 秒,脚本将抓取页面并检查更改。如果有变化,返回旧值和新值:

from bs4 import BeautifulSoup
import requests
from time import sleep

url = 'https://www.random.org/integers/?num=1&min=1&max=2&col=5&base=10&format=html&rnd=new'

def get_data(url):
    return BeautifulSoup(requests.get(url).text, 'lxml')

def watch(url, seconds=1):
    soup = get_data(url)
    old_data = soup.select_one('pre.data').text.strip()
    while True:
        sleep(seconds)
        soup = get_data(url)
        data = soup.select_one('pre.data').text.strip()
        if data != old_data:
            yield old_data, data
        old_data = data

for old_val, new_val in watch(url):
    print('Data changed! Old value was {}, new value is {}'.format(old_val, new_val))

打印(例如):

Data changed! Old value was 1, new value is 2
Data changed! Old value was 2, new value is 1
Data changed! Old value was 1, new value is 2
Data changed! Old value was 2, new value is 1
Data changed! Old value was 1, new value is 2
Data changed! Old value was 2, new value is 1

...and so on.

您需要根据需要更改URL和选择正确的 HTML 元素。


推荐阅读