首页 > 解决方案 > 如何获取具有相同类名和属性的特定项目

问题描述

如何获得具有相同类名和属性的特定项目?

我需要得到这 3 件物品

2013 年 4 月 14 日

580

佛罗里达州皮尔斯堡

<dl class="pairsJustified">
<dt>Joined:</dt>
<dd>Apr 14, 2013</dd>
</dl>
<dl class="pairsJustified">
<dt>Messages:</dt>
<dd><a href="search/member?user_id=13302" class="concealed" 
rel="nofollow">580</a></dd>
</dl>

<dl class="pairsJustified">
<dt>Location:</dt>
<dd>
<a href="misc/location-info?location=Fort+Pierce%2C+FL" target="_blank" 
rel="nofollow noreferrer" itemprop="address" class="concealed">Fort 
Pierce, FL</a>

标签: pythonweb-scrapingscrapy

解决方案


使用它们位于<dd>标签下,使用.find_all()

from bs4 import BeautifulSoup

test = '''<dl class="pairsJustified">
<dt>Joined:</dt>
<dd>Apr 14, 2013</dd>
</dl>
<dl class="pairsJustified">
<dt>Messages:</dt>
<dd><a href="search/member?user_id=13302" class="concealed" 
rel="nofollow">580</a></dd>
</dl>

<dl class="pairsJustified">
<dt>Location:</dt>
<dd>
<a href="misc/location-info?location=Fort+Pierce%2C+FL" target="_blank" 
rel="nofollow noreferrer" itemprop="address" class="concealed">Fort Pierce, FL</a>'''

soup = BeautifulSoup(test, 'html.parser')
data = soup.find_all("dd")
for d in data:
    print(d.text.strip())

输出

Apr 14, 2013
580
Fort Pierce, FL

推荐阅读