首页 > 解决方案 > Python for循环没有遍历所有元素

问题描述

我正在尝试编写我第一次webscraper使用pythonand的代码BeautifulSoup

我正在尝试检索网页上所有列表的所有 URL,但我没有得到一个包含所有 URL 的数组,而是只得到一个 URL。

以下是我使用的代码

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.pararius.com/apartments/enschede'

uClient = uReq(my_url)
page_html=uClient.read()
uClient.close()

page_soup = soup(page_html,"html.parser")

compartments = page_soup.findAll("li",{"class":"property-list-item-container"})

#Here is where im trying to store all the urls in url_det 
for compartment in compartments:
    url_det = compartment.h2.a["href"]

任何输入表示赞赏!

标签: pythonfor-loopweb-scraping

解决方案


循环的每次迭代都会覆盖 的内容url_det,而是使用列表推导来存储列表中的所有值,例如:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.pararius.com/apartments/enschede'

uClient = uReq(my_url)
page_html=uClient.read()
uClient.close()

page_soup = soup(page_html,"html.parser")

compartments = page_soup.findAll("li",{"class":"property-list-item-container"})

url_det = [compartment.h2.a["href"] for compartment in compartments]

print(url_det)
>>> ['/house-for-rent/enschede/PR0001596564/otto-van-taverenstraat', ... , '/house-for-rent/enschede/PR0001594320/hanenberglanden']

推荐阅读