首页 > 解决方案 > 遍历代理列表 selenium python

问题描述

我正在制作一个 twitter刮板,它可以抓取twitter帐户的恢复信息。问题是我需要每 4 次迭代更改一次代理,我想出了什么:

     from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import smtplib
import time
import sys
import os

#puts all the links into a username
with open('list.txt', 'r') as f:
    users = [line.strip() for line in f]
#where results are stored
result = open(r"results.txt","w")
#stores the proxys
with open('proxy.txt', 'r') as f:
    proxies = [line.strip() for line in f]
counter = 0
for user in users:
        
    #adding proxy
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--proxy-server=http://%s' % proxies[counter])      #iniatialization of the driver
    browser = webdriver.Chrome(chrome_options=chrome_options)
    counter=counter+1
    #stage 1 username was entered and submitted
    url = 'https://twitter.com/account/begin_password_reset'
    browser.get(url)
    textbox = browser.find_element_by_xpath('/html/body/div[2]/div/form/input[2]')
    textbox.send_keys(user)
    submit_button = browser.find_element_by_xpath('/html/body/div[2]/div/form/input[3]')
    submit_button.click()
    validator= browser.find_element_by_xpath('/html/body/div[2]/div/form/input[2]')
    validator2=browser.find_element_by_xpath('/html/body/div[2]/div/a')
    if not validator or not validator2:
        
        name = browser.find_element_by_xpath("/html/body/div[2]/div/div[2]/div/div[1]").text
        email= browser.find_element_by_xpath("/html/body/div[2]/div/form/ul/li/label/strong").text
        result.write(user+ ":"+ name+":"+email) 
 

我面临的问题是代理没有迭代,一些代理已经死了,所以它停止了整个程序。

标签: pythonseleniumproxy

解决方案


为了解决您的问题,您可以实施一种方法,每四次迭代从您的列表中选择一个不同的随机代理,并使用一个tryexcept块来处理这样的死代理:

import random

for i,user in enumerate(users):
    chrome_options = webdriver.ChromeOptions()
    if i % 4 == 0:
        proxy = random.choice(proxies)
    try:
        chrome_options.add_argument('--proxy-server=http://%s' % proxy)
        # continue with your scraping
    except Exception as e:
        print(e)

您可以轻松地改进此代码段,例如通过在遇到死代理时捕获特定错误并将其从列表中删除(使用.pop())或添加while循环以在遇到错误后重复迭代。


推荐阅读