python - 如何使用 concurrent.futures ThreadPoolExecutor 循环代理
问题描述
这个 python 代码从 wikipedia 中抓取标题和内容,它在 wikipedia 中搜索来自 csv 的不同术语,并将标题和内容拉入 .csv。该代码与 ThreadPoolExecutor 一起运行良好。
我需要轮换代理,但我不知道如何使用 ThreadPoolExecutor 来做到这一点。
(我忘了提到我使用刮掉的免费代理)
scrapelist.csv
apple
banana
mango
etc.
.
proxylist.csv
161.35.22.17:80
165.227.108.19:80
161.35.52.72:80
135.125.107.126:80
15.188.22.231:3128
.
import requests
import csv
import time
import random
from datetime import datetime
from csv import reader
from bs4 import BeautifulSoup
import concurrent.futures
namelist = []
proxylist = []
with open('scrapelist.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
namelist.append(row[0])
with open('proxylist.csv', 'r') as prlist:
reader = csv.reader(prlist)
for row in reader:
proxylist.append(row[0])
def scrapecontent(search_term):
url="https://en.wikipedia.org/wiki/"+search_term
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
title= soup.find('h1', {"class":"firstHeading"}).text
alltext=soup.find('div', {"class":"mw-parser-output"}).text.replace("\n", " ")
now = datetime.now() # current date and time
date_time = now.strftime("%Y%m%d_%H%M%S")
filename=search_term+"_"+date_time
myfilename ="%s.csv"% filename
print("-----\n"+myfilename)
with open(myfilename, 'w', encoding="utf-8", newline='') as myfile:
wikititle=title
wikibody=alltext
writer = csv.writer(myfile)
writer.writerow([wikititle,wikibody])
myfile.close()
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(scrapecontent, namelist)
解决方案
推荐阅读
- c# - string.Contains 作为谓词而不是函数调用?
- intellij-idea - IntelliJ 识别类但“无法访问类”
- swift - @objc 关键字扩展子类行为
- macos - 部署错误:Tomcat 启动失败(Netbeans +Mac OS)
- javascript - 正确的登录页面模式
- ant - 使用 if with resourcecontains in 并且不返回真值
- javascript - 堆积图 - AmChart - Javascript
- css - 将整个博客和侧边栏居中(Wordpress 主题)
- jquery - 循环中的 HighCharts Js 图表
- javascript - 如果选中复选框,则触发事件 javascript