首页 > 解决方案 > 获取字符串中另一个元素之后的元素

问题描述

我有一个来自 BeautifulSoup 分析的 html,我想提取以下内容star0 sa2

>>>short_comment[1]['name']

<div class="author">
   <a href="/member/?id=59465221" target="_blank">唐牛</a>
    <span class="star0 sa2"></span></div>

我用正则表达式尝试了一件事 star0\s[a-zA-Z0-9]但什么也没回来。现在我正在尝试替换<并分隔最后一个字符串:

>>> s = s.replace('<','>')
>>> s.split('>')
['', 'div class="author"', ' ', 'a href="/member/?id=59465221" target="_blank"', '唐牛', '/a', ' ', 'span class="star0 sa2"', '', '/span', '', '/div', '']
>>> s.find("star0")

我还尝试使用 BS4 将类从与“作者”类匹配的元素中取出

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0'}
base_url = 'https://www.nosetime.com'

def get_perfume_as_dict(url):
  print(base_url + url)
  response_unicode = requests.get(base_url + url, headers=headers)
  soup = BeautifulSoup(response_unicode.text, 'html.parser')
  perfume = {}
  perfume["short_comment"] = [
                              {"name": name.text,
                               "rating": name.span['class'][1],
                               "comment": comment.text} for 
                              name,
                              comment in zip(
                                  soup.find_all('div', {'class':"author"}), 
                                  soup.find_all('div', {'class':"hfshow1"}), 
                                  )
                              ] #soup.find('li', {'id':'itemcomment'}) # soup.find_all('span ', {'class':'fav_cnt'})

但是当我启动它时它似乎陷入了一个循环:

get_perfume_as_dict("/xiangshui/350870-oulong-atelier-cologne-oolang-infini.html")

标签: pythonpython-3.xtext

解决方案


用于BeautifulSoup查询您的 html

前任:

from bs4 import BeautifulSoup

short_comment = """<div class="author">
   <a href="/member/?id=59465221" target="_blank">唐牛</a>
    <span class="star0 sa2"></span></div>"""
   
soup = BeautifulSoup(short_comment, "html.parser")
print(soup.find("div", {'class':'author'}).span['class'])

输出:

['star0', 'sa2']

推荐阅读