首页 > 解决方案 > Python BeautifulSoup - 抓取多个页面并将结果导出到 CVS

问题描述

我想在不同的页面中抓取一些信息。下面的代码可以帮助我使用 print() 函数抓取信息。

问题是我只从最后一页获取数据。上一页的结果无法写入 CSV 文件。我该怎么办?谢谢。

编码:

enter code here
import requests
from csv import writer
from bs4 import BeautifulSoup

urls = ['https://www.xxxxxxxxxxxxxxx/02-nb.php','https://www.xxxxxxxxxxxxxxx/03-np.php','https://www.xxxxxxxxxxxxxxx/04-nb.php']

for index,url in enumerate(urls):
    requests.get(url)
    page = requests.get(url)
    soup = BeautifulSoup(page.text, 'lxml')
    print(soup)
    table_data = soup.find('table')

with open("words.csv", "wt",newline='',encoding='utf-8') as csv_file:
    csv_data = writer(csv_file, delimiter =',')
    for voc in table_data.find_all('tr'):
        row_data = voc.find_all('td')
        row = [tr.text for tr in row_data]
        csv_data.writerow(row)

标签: pythoncsvweb-scrapingbeautifulsoup

解决方案


您正在遍历每个 URL,但您编写的将数据写入 CSV 的逻辑不在该for循环之外,因此它只是将最后一点数据写入文件。我相信你想要的是:

for index,url in enumerate(urls):
    requests.get(url)
    page = requests.get(url)
    soup = BeautifulSoup(page.text, 'lxml')
    print(soup)
    table_data = soup.find('table')
    
    if index:
        mode = "a"
    else:
        mode = "w"

    with open("words.csv", mode, newline='',encoding='utf-8') as csv_file:
        csv_data = writer(csv_file, delimiter =',')
        for voc in table_data.find_all('tr'):
            row_data = voc.find_all('td')
            row = [tr.text for tr in row_data]
            csv_data.writerow(row)

这将words.csv在每次迭代中写入urls,而不是遍历所有urlswords.csv在最后一次迭代中写入。


推荐阅读