首页 > 解决方案 > 通过 Selenium 进行 Web 抓取时遇到“列表索引超出范围”异常

问题描述

我正在使用 Selenium 为数据科学项目抓取数据,但我不知道为什么会在 write-to-csv 部分出现索引错误。当我按原样打印数据时,输出看起来很正常。

下面的代码:

'''

driver = webdriver.Firefox(executable_path="/filepath/geckodriver.exe")

url = 'https://website.com'

driver.get(url)

with open('file.csv', 'w') as f:
    f.write('Column1', 'Column2', 'Column3', '\n')

ids = driver.find_elements_by_xpath('//*[@class="id-name"]')
id_list = []
for i in range(50):
    id_list.append(ids[i].text)
print(len(ids))
print(len(id_list))
print(id_list[0:50])

# Break up into batches to save memory
new_id_list = [id_list[i:i+5] for i in range(0,len(id_list),5)]

#time.sleep(1200)

for i in range(len(new_id_list)):
    for j in range(len(new_id_list[i])):
        url = 'http://www.website.com?id=' + str(id_list[j])
        driver.get(url)
        col1 = driver.find_elements_by_xpath('//*[@id="field-value-col_1"]/span/span')
        col2 = driver.find_elements_by_xpath('//h1[@id="field-value-col_2"]')
        col3 = driver.find_elements_by_xpath('//*[@id="field-value-col_3"]')

        print(id_list[i][j] + ',' + col1[0].text + ',' + col2[0].text + ',' + col3[0].text, '\n')

# This is where I get the error usually.

        with open('bugzilla.csv', 'w') as f:
            f.write(id_list[i][j] + ',' + col1[0].text + ',' + col2[0].text + ',' + col3[0].text, '\n')


    print('Batch of 5')

f.close()

'''

标签: pythonseleniumweb-scrapingdata-sciencedata-analysis

解决方案


这里

print(id_list[i][j] + ',' + col1[0].text + ',' + col2[0].text + ',' + col3[0].text, '\n')

您将id_listas 与二维数组一起使用,而之前您将其定义为

id_list = []
for i in range(50):
    id_list.append(ids[i].text)

你可能的意思是:print(new_id_list[i][j] + ',' + col1[0].text + ',' + col2[0].text + ',' + col3[0].text, '\n')


推荐阅读