python - 当我使用多处理进行尝试时,我的脚本会引发错误
问题描述
我在 python 中创建了一个脚本,使用multiprocessing
库从网页中抓取某些字段。由于我不知道如何使用multiprocessing
我执行以下脚本时出现错误:
import requests
from lxml.html import fromstring
from multiprocessing import Process
link = "https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los%20Angeles%2C%20CA&page={}"
def create_links(url):
response = requests.get(url).text
tree = fromstring(response)
for title in tree.cssselect("div.info"):
name = title.cssselect("a.business-name span")[0].text
street = title.cssselect("span.street-address")[0].text
try:
phone = title.cssselect("div[class^=phones]")[0].text
except IndexError:
phone = ""
print(name, street, phone)
if __name__ == '__main__':
links = [link.format(page) for page in range(4)]
p = Process(target=create_links, args=(links,))
p.start()
p.join()
我遇到的错误:
722, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
我收到该错误是因为脚本将链接列表视为单个链接,而我知道我必须在args=(links,)
. 我怎样才能成功运行它?
解决方案
与池一起工作正常
import requests
from lxml.html import fromstring
from multiprocessing import Pool
link = "https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los%20Angeles%2C%20CA&page={}"
def create_links(url):
response = requests.get(url).text
tree = fromstring(response)
for title in tree.cssselect("div.info"):
name = title.cssselect("a.business-name span")[0].text
street = title.cssselect("span.street-address")[0].text
try:
phone = title.cssselect("div[class^=phones]")[0].text
except IndexError:
phone = ""
print(name, street, phone)
links = [link.format(page) for page in range(4)]
def main():
with Pool(4) as p:
print(p.map(create_links, links))
if __name__ == '__main__':
main()
输出
Caffe Latte 6254 Wilshire Blvd (323) 936-5213
Bourgeois Pig 5931 Franklin Ave (323) 464-6008
Beard Papa Sweet Cafe 6801 Hollywood Blvd Ste 157 (323) 462-6100
Intelligentsia Coffee 3922 W Sunset Blvd (323) 663-6173
The Downbeat Cafe 1202 N Alvarado St (213) 483-3955
Sabor Y Cultura 5625 Hollywood Blvd (323) 466-0481
The Wood Cafe 12000 Washington Pl (310) 915-9663
Groundwork Coffee Inc 1501 N Cahuenga Blvd (323) 871-0143
The Apple Pan 10801 W Pico Blvd (310) 475-3585
Good Microbrew & Grill 3725 W Sunset Blvd (323) 660-3645
The Standard Hollywood 8300 W Sunset Blvd (323) 650-9090
推荐阅读
- php - php验证文本框中的数据
- javascript - 在等待 Ajax 响应时显示加载器,然后淡出加载器并淡入响应没有 jQuery
- specman - Specman 如何从文件中读取特定行,没有循环
- reactjs - 渲染 SFC 或调用返回元素的函数之间的区别
- amazon-web-services - ELK Community Beats 获取有关 ELK 的 AWS 日志
- python-3.x - 并行化初始边界值问题(有限差分)
- html - 我无法让导航栏子菜单垂直对齐
- go - 如何在 protobuf3 中存储 time.duration
- mysql - Larvel Eloquent 模型保存功能不保存但不出错
- c# - C#:在控制台中计算获胜条件