python - 如何抓取多个标签下的段落数据
问题描述
<div class="woocommerce-Tabs-panel woocommerce-Tabs-panel--specification panel entry-content wc-tab" id="tab-specification" role="tabpanel" aria-labelledby="tab-title-specification" style="display: block;">
<table class="woocommerce-product-attributes shop_attributes">
<tbody>
<tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--weight">
<th class="woocommerce-product-attributes-item__label">Weight</th>
<td class="woocommerce-product-attributes-item__value">1.6 kg</td>
</tr>
<tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--attribute_pa_brands">
<th class="woocommerce-product-attributes-item__label">brands</th>
<td class="woocommerce-product-attributes-item__value"><p><a href="https://khusheimstore.com/brands/makita/" rel="tag">MAKITA</a></p></td>
</tr>
</tbody>
</table>
<p>
Capacity<br>
Steel : 10mm (3/8″)<br>
Wood : 21mm (13/16″)<br>
Masonry : 8mm (5/16″)<br>
Impacts per minute (ipm)<br>
Impact-driver mode: 0 – 3,200<br>
Hammer drill mode: 0 – 27,600<br>
No load speed (rpm)<br>
Impact-driver mode: 0 – 2,300<br>
Drill mode (Hi / Lo): 0-2,300 / 0-700<br>
Screwdriver mode: 0 – 2,300<br>
Max fastening torque<br>
Impact-driver mode: 140N•m (1,240in.lbs)<br>
Drill mode (Hard/ Soft): 50/ 10N•m<br>
Dimensions (L x W x H)<br>
186 x 79 x 246mm<br>
Net weight<br>
1.6kg (3.6lbs)<br>
Standard Equipment: 1 Phillips Bit.<br>
Model comes without Battery and Charger
</p>
这是我需要从头到尾刮掉的 HTML<p>Capacity<br>
代码
url = "https://khusheimstore.com/product/makita-cordless-4-mode-impact-driver-for-18vli-ion-dtp140z-dtp140z-220/"
headers = {"Accept-Language": "en-US, en;q=0.5"}
results = requests.get(url, headers=headers)
soup = BeautifulSoup(results.text, "html.parser")
#initiate data storag
Techspec= []
Techspec.append (soup.findAll('div', attrs={"id":"woocommerce-Tabs-panel woocommerce-Tabs-panel--specification panel entry-content wc-tab"}))
解决方案
您需要的数据存在于第二个<p>
标签中。您需要首先选择该标签,然后从中提取数据。
这是它是如何完成的。
import bs4 as bs
s = """
<div class="woocommerce-Tabs-panel woocommerce-Tabs-panel--specification panel entry-content wc-tab" id="tab-specification" role="tabpanel" aria-labelledby="tab-title-specification" style="display: block;">
<table class="woocommerce-product-attributes shop_attributes">
<tbody><tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--weight">
<th class="woocommerce-product-attributes-item__label">Weight</th>
<td class="woocommerce-product-attributes-item__value">1.6 kg</td>
</tr>
<tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--attribute_pa_brands">
<th class="woocommerce-product-attributes-item__label">brands</th>
<td class="woocommerce-product-attributes-item__value"><p><a href="https://khusheimstore.com/brands/makita/" rel="tag">MAKITA</a></p>
</td>
</tr>
</tbody></table>
<p>Capacity<br>
Steel : 10mm (3/8″)<br>
Wood : 21mm (13/16″)<br>
Masonry : 8mm (5/16″)<br>
Impacts per minute (ipm)<br>
Impact-driver mode: 0 – 3,200<br>
Hammer drill mode: 0 – 27,600<br>
No load speed (rpm)<br>
Impact-driver mode: 0 – 2,300<br>
Drill mode (Hi / Lo): 0-2,300 / 0-700<br>
Screwdriver mode: 0 – 2,300<br>
Max fastening torque<br>
Impact-driver mode: 140Nm (1,240in.lbs)<br>
Drill mode (Hard/ Soft): 50/ 10Nm<br>
Dimensions (L x W x H)<br>
186 x 79 x 246mm<br>
Net weight<br>
1.6kg (3.6lbs)<br>
Standard Equipment: 1 Phillips Bit.<br>
Model comes without Battery and Charger</p>
"""
soup = bs.BeautifulSoup(s, 'lxml')
p = soup.find_all('p')[1]
for i in list(p.stripped_strings):
print(i.strip())
Capacity
Steel : 10mm (3/8″)
Wood : 21mm (13/16″)
Masonry : 8mm (5/16″)
Impacts per minute (ipm)
Impact-driver mode: 0 – 3,200
Hammer drill mode: 0 – 27,600
No load speed (rpm)
Impact-driver mode: 0 – 2,300
Drill mode (Hi / Lo): 0-2,300 / 0-700
Screwdriver mode: 0 – 2,300
Max fastening torque
Impact-driver mode: 140Nm (1,240in.lbs)
Drill mode (Hard/ Soft): 50/ 10Nm
Dimensions (L x W x H)
186 x 79 x 246mm
Net weight
1.6kg (3.6lbs)
Standard Equipment: 1 Phillips Bit.
Model comes without Battery and Charger
推荐阅读
- python - django 将 csv 附加到电子邮件中
- java - 正则表达式匹配显示超过 3 周的条目
- c++ - 在 C/C++ mscoree.tlh 错误中加载 C# 程序集
- drupal-8 - kint 内容将我重定向到安装页面
- c# - 是否可以从另一个类的方法中获取多个值?
- javascript - 从 Buffer 中解压缩文件,而不在 NodeJS 中的磁盘上的任何位置保存文件
- javascript - 在 django 模板中实时显示控制台输出
- node.js - MongoDb 与 Node js 的连接
- python - 如何在python中从QThread中捕获未捕获的异常
- java - 对自定义类arraybuffer进行排序并在scala中获取子集