python - 我不确定如何将这个 for 循环与多处理模块并行化
问题描述
我想减少使用多处理完成 for 循环所需的时间,但我不确定如何明确执行它,因为我没有看到可以应用于此代码的模块的任何明确的基本使用模式。
allLines = fileRead.readlines()
allLines = [x.strip() for x in allLines]
for i in range (0,len(allLines)):
currentWord = allLines[currentLine]
currentLine += 1
currentURL = URL+currentWord
uClient = uReq(currentURL)
pageHTML = uClient.read()
uClient.close()
pageSoup = soup(pageHTML,'html.parser')
pageHeader = str(pageSoup.h1)
if 'Sorry!' in pageHeader:
with open(fileA,'a') as fileAppend:
fileAppend.write(currentWord + '\n')
print(currentWord,'available')
else:
print(currentWord,'taken')
编辑:新代码,但它仍然坏了......
allLines = fileRead.readlines()
allLines = [x.strip() for x in allLines]
def f(indexes, allLines):
for i in indexes:
currentWord = allLines[currentLine]
currentLine += 1
currentURL = URL+currentWord
uClient = uReq(currentURL)
pageHTML = uClient.read()
uClient.close()
pageSoup = soup(pageHTML,'html.parser')
pageHeader = str(pageSoup.h1)
if 'Sorry!' in pageHeader:
with open(fileA,'a') as fileAppend:
fileAppend.write(currentWord + '\n')
print(currentWord,'available')
else:
print(currentWord,'taken')
for i in range(threads):
indexes = range(i*len(allLines), i*len(allLines)+threads, 1)
Thread(target=f, args=(indexes, allLines)).start()
解决方案
- 将代码放入函数中
- 拆分索引
- 启动线程
from threading import Thread
THREADS = 10
allLines = fileRead.readlines()
allLines = [x.strip() for x in allLines]
def f(indexes, allLines):
#This entire for loop needs to be parallelized
for i in indexes:
currentWord = allLines[currentLine]
currentLine += 1
currentURL = URL+currentWord
uClient = uReq(currentURL)
pageHTML = uClient.read()
uClient.close()
pageSoup = soup(pageHTML,'html.parser')
pageHeader = str(pageSoup.h1)
if 'Sorry!' in pageHeader:
with open(fileA,'a') as fileAppend:
fileAppend.write(currentWord + '\n')
print(currentWord,'available')
else:
print(currentWord,'taken')
for i in range(THREADS):
indexes = range(i*len(allLines), i*len(allLines)+THREADS, 1)
Thread(target=f, args=(indexes, allLines)).start()
推荐阅读
- environment-variables - 将环境变量从 google colab 设置到本地机器
- amazon-web-services - 使用来自用户池的 SAML 身份验证的令牌响应来检索 AWS 临时访问密钥
- sql-server - 如果同时使用美元和括号符号,则 T-SQL 错误
- sql-server - SQL Server INSERT INTO 特殊
- python - 滚动笔记本标签 Tkinter
- javascript - 用 vis.js 绘制大树
- css - angularjs网格列表顶部对齐
- php-7 - preg_replace():分隔符不能是字母数字或反斜杠 128
- objective-c - 如何使用Objective C中的完成块区分函数的返回值?
- go - Golang 意外的 EOF