首页 > 解决方案 > Webscraping Python 试图拉变化的“id”

问题描述

以下是我的代码。

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://steamcommunity.com/market/listings/730/Souvenir%20P2000%20%7C%20Chainmail%20%28Factory%20New%29'
# open and read
uClient = uReq(my_url)
page_html = uClient.read()
#close
uClient.close()
#html parse
page_soup = soup(page_html,"html.parser")
#grab all listings
containers = page_soup.findAll("div",{"class":"market_listing_item_name_block"})

for container in containers:
    block_container = container.findAll("span",{"class":"market_listing_item_name"})

返回多个结果都是一样的block_container,除了它们在<span>id = "listing_#_name"其中 # 是数字的组合,每个结果都会改变<span>

例如 -

</br></div>, <div class="market_listing_item_name_block">
<span class="market_listing_item_name" id="listing_2060891817875196312_name" style="color: #FFD700;">Souvenir P2000 | Chainmail (Factory New)</span>
<br/>

<span class="market_listing_game_name">Counter-Strike: Global Offensive</span>
</div>, <div class="market_listing_item_name_block">
<span class="market_listing_item_name" id="listing_2076653149485426829_name" style="color: #FFD700;">Souvenir P2000 | Chainmail (Factory New)</span>
<br/>

谁能解释我如何id从所有跨度中获取信息?

标签: pythonhtmlbeautifulsoup

解决方案


您可以idspan标签中获取。

尝试:

for container in containers:
    for block_container in container.findAll("span", class_="market_listing_item_name"):
        print(block_container.attrs['id'])

来自美丽的汤文档

一个标签可以有任意数量的属性。该标签有一个属性id,其值为boldest。您可以通过将标签视为字典来访问标签的属性:

tag['id']
# u'boldest'

您可以直接访问该字典.attrs

tag.attrs
# {u'id': 'boldest'}

参考:


推荐阅读