首页 > 解决方案 > 当我使用多处理进行尝试时,我的脚本会引发错误

问题描述

我在 python 中创建了一个脚本,使用multiprocessing库从网页中抓取某些字段。由于我不知道如何使用multiprocessing我执行以下脚本时出现错误:

import requests 
from lxml.html import fromstring
from multiprocessing import Process

link = "https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los%20Angeles%2C%20CA&page={}"

def create_links(url):
    response = requests.get(url).text
    tree = fromstring(response)
    for title in tree.cssselect("div.info"):
        name = title.cssselect("a.business-name span")[0].text
        street = title.cssselect("span.street-address")[0].text
        try:
            phone = title.cssselect("div[class^=phones]")[0].text
        except IndexError:
            phone = ""
        print(name, street, phone)

if __name__ == '__main__':
    links = [link.format(page) for page in range(4)]
    p = Process(target=create_links, args=(links,))
    p.start()
    p.join()

我遇到的错误:

722, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)

我收到该错误是因为脚本将链接列表视为单个链接,而我知道我必须在args=(links,). 我怎样才能成功运行它?

标签: pythonpython-3.xweb-scrapingmultiprocessing

解决方案


与池一起工作正常

import requests 
from lxml.html import fromstring
from multiprocessing import Pool

link = "https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los%20Angeles%2C%20CA&page={}"

def create_links(url):
    response = requests.get(url).text
    tree = fromstring(response)
    for title in tree.cssselect("div.info"):
        name = title.cssselect("a.business-name span")[0].text
        street = title.cssselect("span.street-address")[0].text
        try:
            phone = title.cssselect("div[class^=phones]")[0].text
        except IndexError:
            phone = ""
        print(name, street, phone)


links = [link.format(page) for page in range(4)]

def main():
    with Pool(4) as p:
        print(p.map(create_links, links))

if __name__ == '__main__':
    main()

输出

Caffe Latte 6254 Wilshire Blvd (323) 936-5213
Bourgeois Pig 5931 Franklin Ave (323) 464-6008
Beard Papa Sweet Cafe 6801 Hollywood Blvd Ste 157 (323) 462-6100
Intelligentsia Coffee 3922 W Sunset Blvd (323) 663-6173
The Downbeat Cafe 1202 N Alvarado St (213) 483-3955
Sabor Y Cultura 5625 Hollywood Blvd (323) 466-0481
The Wood Cafe 12000 Washington Pl (310) 915-9663
Groundwork Coffee Inc 1501 N Cahuenga Blvd (323) 871-0143
The Apple Pan 10801 W Pico Blvd (310) 475-3585
Good Microbrew & Grill 3725 W Sunset Blvd (323) 660-3645
The Standard Hollywood 8300 W Sunset Blvd (323) 650-9090

推荐阅读