python - 如何将 bs4 findall() 对象转换为字符串
问题描述
这是我的代码:
with requests.Session() as s:
r = s.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
sizes = soup.findAll(True,{'class':'product__sizes-size-1'})
我想将尺寸变成一个字符串对象而不是一个标签,这样我就可以完成
parsed_sizes = [item for item in sizes if 1 <= item <= 20]
这需要一个字符串来比较现在打印尺寸输出:
[<span class="product__sizes-size-1">6</span>, <span class="product__sizes-size-1">6.5</span>, <span class="product__sizes-size-1">7</span>, <span class="product__sizes-size-1">7.5</span>, <span class="product__sizes-size-1">8</span>, <span class="product__sizes-size-1">8.5</span>, <span class="product__sizes-size-1">9</span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>]
如果我这样做,type()
我会得到<class'bs4.element.ResultSet'>
解决方案
您需要获取标签文本,转换为数字,然后它应该可以工作。
例如:
from bs4 import BeautifulSoup
sizes = """[<span class="product__sizes-size-1">6</span>, <span class="product__sizes-size-1">6.5</span>, <span class="product__sizes-size-1">7</span>, <span class="product__sizes-size-1">7.5</span>, <span class="product__sizes-size-1">8</span>, <span class="product__sizes-size-1">8.5</span>, <span class="product__sizes-size-1">9</span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>, <span class="product__sizes-size-1"></span>]
"""
soup = BeautifulSoup(sizes, "html.parser").find_all(True, {'class': 'product__sizes-size-1'}, text=True)
parsed_sizes = [
item.getText(strip=True) for item in soup
if 1 <= float(item.getText(strip=True)) <= 20
]
print(parsed_sizes)
输出:
['6', '6.5', '7', '7.5', '8', '8.5', '9']
推荐阅读
- python - 如何从列表中的多个 URL 中抓取和提取相同的特定信息
- javascript - Jquery不允许访问同一类中的构造函数或函数
- reactjs - react i18n - 函数作为 React 子级无效
- flutter - 运行颤振 2.2.3 时缺少 FacebookAuth
- python-3.x - 有没有办法在 Google colab 中观察创建的变量?
- c++ - `vaddhn_high_s16` 实际上做了什么?
- python - 在列表中使用索引时遇到问题
- javascript - 将具有重复值的对象组合在数组中
- android - 不能在 CardView 中设置 RecyclerView?
- sql - 如何在具有完全非结构化字符串列的任何表的任何列上的任何匹配字符上加入多个表?