首页 > 解决方案 > 我如何获取每个抓取的 URL 文本值我只获取最后一个 URL 值

问题描述

我是 Python 新手。通过这段代码,我只保留 URL 值,但我希望每个 URL 都抓取内容。

contents = []
with open('c:\\users\\thegl\\documents\\datab.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        contents.append(url) # Add each url to list contents

for url in contents:  # Parse through each url in the list.
    page = urlopen(url[0]).read()
    soup = BeautifulSoup(page, "html.parser")
for List in soup.find_all('ol',class_='breadcrumb'):
    for listext in List.find_all('li'):
        print(listext.text)

文件 datab.csv 包含以下 ULS: https ://www.dumpstool.com/1Y0-371-exam.html https://www.dumpstool.com/TK0-201-exam.html https://www.dumpstool。 com/C9510-401-exam.html

标签: pythonbeautifulsoupscreen-scraping

解决方案


您应该缩进最后一个for,以便为每个 url 执行它。

contents = []
with open('c:\\users\\thegl\\documents\\datab.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        contents.append(url) # Add each url to list contents

for url in contents:  # Parse through each url in the list.
    page = urlopen(url[0]).read()
    soup = BeautifulSoup(page, "html.parser")
    for List in soup.find_all('ol',class_='breadcrumb'):
        for listext in List.find_all('li'):
            print(listext.text)

推荐阅读