首页 > 解决方案 > 使用 beautifulsoup 无法获得结果

问题描述

我有一个 html 页面

我想提取所有标签“href”属性值。

下面是html页面:

<div class="universal">
<div class="slider">
    <a class="focus" href="/1295%2C"><div><div><div>St</div></div></div></a>,
    <a class="focus" href="/2395%2C"><div><div><div>GT</div></div></div></a>
</div>
<div class="slider">
    <a class="focus" href="/3495%2C"><div><div><div>KT</div></div></div></a>,
    <a class="focus" href="/4595%2C"><div><div><div>LT</div></div></div></a>
</div>
<div class="slider">
    <a class="focus" href="/5695%2C"><div><div><div>OT</div></div></div></a>,
    <a class="focus" href="/6795%2C"><div><div><div>OT</div></div></div></a>,
    <a class="focus" href="/7895%2C"><div><div><div>OT</div></div></div></a>
</div>

我尝试使用以下代码:

from bs4 import BeautifulSoup
response = html_page
html_text = BeautifulSoup(response, "html.parser")
shows = html_text.find('div', {'class': 'slider'}).findAll('a', {'class': 'focus'})

urls = []
for a_tag in shows :
    urls.append(a_tag.find('a', {'class': 'focus'}).attrs['href'])
print urls

它给出了 None 类型的对象没有属性“findAll”请帮助

标签: python-2.7beautifulsoup

解决方案


这是一种使用find_all.

演示:

from bs4 import BeautifulSoup

html_text = BeautifulSoup(html, "html.parser")
shows = html_text.find_all('div', {'class': 'slider'})

urls = []
for div in shows:
    for a_tag in div.find_all('a', {'class': 'focus'}):
        urls.append(a_tag.attrs['href'])
print urls

输出:

[u'/1295%2C', u'/2395%2C', u'/3495%2C', u'/4595%2C', u'/5695%2C', u'/6795%2C', u'/7895%2C']

推荐阅读