首页 > 解决方案 > Python 美汤 find_all

问题描述

嗨,我正在尝试从网站获取一些信息。如果我格式化任何错误,请原谅我这是我第一次发布到 SO。

soup.find('div', {"class":"stars"}) 

从这里我收到

<div class="stars" title="4.0 star rating">
<i class="star star--large star-0"></i><i class="star star--large star- 
1"></i><i class="star star--large star-2"></i><i class="star star--large 
star-3"></i><i class="star star--large star-4 star--large--muted"></i> 
</div>

我需要它"4.0 star rating"

当我使用:

soup.find('div', {"class":"stars"})["title"]

它有效,但不适用于 find_all。但我试图找到所有案例并将它们放入列表中。

这是我下面的完整代码。

    def get_info():
        from IPython.display import HTML
        import requests
        from bs4 import BeautifulSoup
        n = 1
        for page in range(53):
            url = f"https://www.sitejabber.com/reviews/apple.com?page= 
   {n}&sort=Reviews.processed&direction=DESC#reviews"
            r = requests.get(url)
            soup = BeautifulSoup(r.text, 'lxml')
            all_reviews = soup.find_all('div', {'class':"truncate_review"})
            all_dates = soup.find_all('div', {'class':'review__date'},'title')
            all_titles = soup.find_all('span', {'class':'review__title__text'})
            reviews_class = soup.find('div', {"class":"review__stars"})
            for review in all_reviews:

    all_reviews_list.append(review.text.replace("\n","").replace("\t",""))
            for date in all_dates:

all_dates_list.append(date.text.replace("\n","").replace("\t",""))
            for title in all_titles:

  all_titles_list.append(title.text.replace("\n","").replace("\t",""))
            for stars in reviews_class.find_all('div', {'class':'stars'}):
                all_star_ratings.append(stars['title'])



            n += 1

抱歉,我的缩进有点乱,但这是我的完整代码。

标签: pythonbeautifulsoup

解决方案


像在字典中一样遍历 bs4 元素。
如果您正在使用find()

soup.find('div', {"class":"stars"}) ['title']

这有效,因为find()返回单个值。
但是如果你使用find_all(),它会返回一个列表并且list[string]是一个无效的进程。
因此,您可以创建一个列表:

res = []
for i in soup.find_all('div', {"class":"stars"}):
    res.append(i['title'])

否则,作为单线:

res = [i['title'] for i in soup.find_all('div', {"class":"stars"})]

既然要review的所有title,就需要指定review的容器,也就是scrape from:

<div class="review__container">

所以代码将是:

review = soup.find_all('div',class_="review__container")
res = [i['title'] for j in review for i in j.find_all('div',class_='stars')]

给出:

['1.0 star rating', '1.0 star rating', '3.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '2.0 star rating', '5.0 star rating', '1.0 star rating', '2.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '5.0 star rating']

推荐阅读