首页 > 解决方案 > BS4 webscraping 从多个元素中获取文本

问题描述

试图从 bs4 Find_all 中提取几个元素作为文本但没有成功

一直在尝试 Get_text 和 Y 中的 X:

请给点建议!

import requests
import re
from bs4 import BeautifulSoup as bs

URL = "URLLINK"

r = requests.get(URL)

soup = bs(r.content)

data = soup.find_all('span', attrs= {"class": "XXX"})

print(data)```

标签: web-scrapingbeautifulsoup

解决方案


最好使用 CSS 选择器或.select()函数

from bs4 import BeautifulSoup

html = '''<span class="sold-property-listing__subheading sold-property-listing--left">         Slutpris 1 400 000 kr </span>, <span class="sold-property-listing__subheading sold-property-listing--left">         Slutpris 1 950 000 kr </span>, <span class="sold-property-listing__subheading sold-property-listing--left">         Slutpris 2 115 000 kr </span>, <span class="sold-property-listing__subheading sold-property-listing--left">         Slutpris 1 900 000 kr</span>'''

soup = BeautifulSoup(html, 'html.parser')
spans = soup.select('.sold-property-listing__subheading')

# or
# spans = soup.select('.sold-property-listing__subheading.sold-property-listing--left')

for s in spans:
  print(s.text)

推荐阅读