首页 > 解决方案 > Python Web-scraping - 嵌套标签

问题描述

我正在尝试从以下页面获取信息

http://books.toscrape.com/

我想获得每本书的评分(星级),我使用了下面的代码

import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

response = requests.get(
    'http://books.toscrape.com/')
if response.status_code == 200:
    print('Requisição bem sucedida!')

linhas = soup.find_all(class_=re.compile("rating"))

但随之而来的是

<p class="star-rating Three">
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
</p>,

我究竟做错了什么 ?

标签: pythonweb-scraping

解决方案


实际上 class-name 包含星值,因此我们可以使用attrs['class']mehtod 提取或d['class'][1]也可以!

import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

response = requests.get(
    'http://books.toscrape.com/')

soup=BeautifulSoup(response.text,"html.parser")

data=soup.find_all("p",class_="star-rating")
for d in data:
    print(d.attrs['class'][1])

输出:

Three
One
One
Four
..

推荐阅读