首页 > 解决方案 > 多处理时如何解决“无法序列化'_io.BufferedReader'对象”?

问题描述

尝试从网站解析大量网页时出现以下错误:“原因:'TypeError(“无法序列化'_io.BufferedReader'对象”,)'。我该如何解决?

完整的错误信息是:

File "main.py", line 29, in <module>
    records = p.map(defs.scrape,state_urls)
  File "C:\Users\Utilisateur\Anaconda3\lib\multiprocessing\pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\Utilisateur\Anaconda3\lib\multiprocessing\pool.py", line 644, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x0000018DD1C3D828>'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object",)'

我在这里浏览了一些类似问题的答案,即这个(multiprocessing.pool.MaybeEncodingError: Error sent result: Reason: 'TypeError("cannot serialize '_io.BufferedReader' object",)')但我没有认为我遇到了同样的问题,因为我不直接在抓取功能中处理文件。

我尝试修改 scrape 函数,使其返回一个字符串而不是列表(不知道我为什么这样做),但我没有工作。

从 main.py 文件:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup 
from multiprocessing import Pool
import codecs
import defs
if __name__ == '__main__':

    filename = "some_courts_test.csv"
        # not the actual values
    courts = ["blabla", "blablabla", "blablabla","blabla"]

    client = defs.init_client()
    i = 1

    # scrapes the data from the website and puts it into a csv file
    for court in courts:
        records = []
        records_string =""
        print("creating a file for the court of : "+court)
        f = defs.init_court_file(court)
        print("generating urls for the court of "+court)        
        state_urls = defs.generate_state_urls(court)
        for url in state_urls:
            print(url)
        print("scraping creditors from : "+court)
        p = Pool(10)

        records = p.map(defs.scrape,state_urls)
        records_string = ''.join(records[1])
        p.terminate()
        p.join()
        for r in records_string:
            f.write(r)
        records = []

        f.close()

来自 defs 文件:

def scrape(url):
        data = []
        row_string = ' '
        final_data = []
        final_string = ' '
        uClient = uReq(url)
        page_html = uClient.read()
        uClient.close()

        page_soup = soup(page_html, "html.parser")
        table = page_soup.find("table", {"class":"table table-striped"})
        table_body = table.find('tbody')
        rows = table_body.find_all('tr')
        for row in rows:
            cols = row.find_all('td')
            cols = [ele.text.replace(',',' ') for ele in cols] #cleans it up

            for ele in cols:
                if ele:
                    data.append(ele)
                data.append(',')
            data.append('\n')
        return(data)

标签: pythonweb-scrapingmultiprocessing

解决方案


推荐阅读