首页 > 解决方案 > 通过多线程/多处理卡住的杀死方法

问题描述

我有一个脚本,它每小时访问当地气象站的网站并抓取当前的降雨量。这被我的洒水器服务器/数据库/等提供。有时网络连接或网站一侧出现问题。这样的不幸使该方法无限期地挂起。这对我的程序的稳定性不利。

我已经尝试过多处理,但我无法让它正常工作。理想情况下,它将启动抓取模块,并每秒返回最多 10 次,因此查看该方法是否已停止并产生输出。如果超过 10 秒,它应该杀死它并尝试下一个小时。你会如何解决这个问题?

我当前的脚本是这样的:

from multiprocessing import Process,Queue,Pipe
import time
import requests
import urllib.request
from bs4 import BeautifulSoup

url = "https://www.weerstationzoersel.be/weather2/index.php?p=10"
def get_rain():
    try:
        response = requests.get(url)
        responsestr = str(response)
        if "200" in responsestr:
            soup = BeautifulSoup(response.text, "html.parser")
            tags = soup.findAll('span')
            line_rain = str(tags[15])
            line_rain = line_rain[62::]
            rainfall = line_rain.rstrip("</span>")
            rainfall = round(float(rainfall.replace(',','.')),1)
    except:
        rainfall="error"
    return(rainfall)

if __name__ == '__main__':
    print(get_rain())

标签: pythonbeautifulsoupmultiprocessing

解决方案


该类multiprocessing.pool.Pool具有apply_async返回AsyncResult对象的方法,您可以在该对象上发出get调用以获取返回值。此方法允许您指定可选的超时值,以便如果任务未完成并且在指定的秒数内为您提供结果,TimeoutError则会引发异常。但是,池中的进程仍在继续运行提交的任务。这没问题;您只需终止池及其所有正在运行的进程。

请注意,如果发生异常,我修改get_rain为返回实际对象而不是字符串“error”:Exception

from multiprocessing import Pool, TimeoutError
import requests
from bs4 import BeautifulSoup

def get_rain_worker():
    url = "https://www.weerstationzoersel.be/weather2/index.php?p=10"
    try:
        response = requests.get(url)
        # throw exception if error:
        response.raise_for_status()
        soup = BeautifulSoup(response.text, "html.parser")
        tags = soup.findAll('span')
        line_rain = str(tags[15])
        line_rain = line_rain[62::]
        rainfall = line_rain.rstrip("</span>")
        rainfall = round(float(rainfall.replace(',','.')),1)
    except Exception as e:
        # return the exception:
        rainfall = e
    return rainfall

def get_rain():
    # We just want a pool size of 1,
    # which will be automatically terminated at the end of the with block
    with Pool(1) as pool:
        async_result = pool.apply_async(get_rain_worker)
        try:
            # timeout after 10 seconds:
            rainfall = async_result.get(10)
        except TimeoutError as e:
            rainfall = e
    return rainfall

if __name__ == '__main__':
    rainfall = get_rain()
    if isinstance(rainfall, Exception):
        print('Got exception:', rainfall)
    else:
        print('rainfall = ', rainfall)

印刷:

rainfall =  0.0

更简单的方法?

但是,我认为唯一的挂断是执行 GET 请求,该请求确实需要一个可选的超时参数。事实上,建议为所有生产代码指定此参数。因此,以下代码可能足以满足您的需求:

import requests
from bs4 import BeautifulSoup

def get_rain():
    url = "https://www.weerstationzoersel.be/weather2/index.php?p=10"
    try:
        response = requests.get(url, timeout=10)
        # throw exception if error:
        response.raise_for_status()
        soup = BeautifulSoup(response.text, "html.parser")
        tags = soup.findAll('span')
        line_rain = str(tags[15])
        line_rain = line_rain[62::]
        rainfall = line_rain.rstrip("</span>")
        rainfall = round(float(rainfall.replace(',','.')),1)
    except Exception as e:
        # return the exception:
        rainfall = e
    return rainfall

if __name__ == '__main__':
    rainfall = get_rain()
    if isinstance(rainfall, Exception):
        print('Got exception:', rainfall)
    else:
        print('rainfall = ', rainfall)

推荐阅读