首页 > 解决方案 > BS4 MemoryError: stack overflow and EOFError: Ran out of input when using multiprocessing in python

问题描述

我有一个简单的 python 脚本,它利用 Python 的 BS4 库和多处理来进行一些网络抓取。我最初遇到一些错误,脚本无法完成,因为我会超过递归限制,但后来我在这里发现 BeautifulSoup 树不能被腌制,因此会导致多处理问题,所以我遵循了最佳答案中的一个建议是要做到以下几点:sys.setrecursionlimit(25000)

这工作了几个星期没有问题(据我所知),但今天我重新启动了脚本,一些进程不起作用,我得到了你可以在下面看到的错误:

我现在收到此错误:

Traceback (most recent call last):
  File "C:/Users/user/PycharmProjects/foo/single_items/single_item.py", line 243, in <module>
    Process(target=instance.constant_thread).start()
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\bs4\element.py", line 1449, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__, tag))
MemoryError: stack overflow
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

我不确定这意味着什么,但这是我正在运行的脚本的伪代码示例:

class foo:
    def __init__(url):
        self.url = url

    def constant_scrape:
        while True:
            rq = make_get_request(self.url)
            soup = BeautifulSoup(rq)



if __name__ == '__main__':

    sys.setrecursionlimit(25000)

    url_list = [...]

    for url in url_list:
        instance = foo(url)
        Process(target=instance.constant_scrape).start()

更新 1: 似乎每次崩溃时都是相同的 URL,即使每个 url 的 HTML 格式(似乎)与不崩溃的 URL 相同。

标签: pythonpython-3.xbeautifulsoupmultiprocessing

解决方案


推荐阅读