首页 > 解决方案 > 如何抓取多个标签下的段落数据

问题描述

<div class="woocommerce-Tabs-panel woocommerce-Tabs-panel--specification panel entry-content wc-tab" id="tab-specification" role="tabpanel" aria-labelledby="tab-title-specification" style="display: block;">
    <table class="woocommerce-product-attributes shop_attributes">
        <tbody>
            <tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--weight">
                <th class="woocommerce-product-attributes-item__label">Weight</th>
                <td class="woocommerce-product-attributes-item__value">1.6 kg</td>
            </tr>
            <tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--attribute_pa_brands">
                <th class="woocommerce-product-attributes-item__label">brands</th>
                <td class="woocommerce-product-attributes-item__value"><p><a href="https://khusheimstore.com/brands/makita/" rel="tag">MAKITA</a></p></td>
            </tr>
        </tbody>
    </table>
    <p>
        Capacity<br>
        Steel : 10mm (3/8″)<br>
        Wood : 21mm (13/16″)<br>
        Masonry : 8mm (5/16″)<br>
        Impacts per minute (ipm)<br>
        Impact-driver mode: 0 – 3,200<br>
        Hammer drill mode: 0 – 27,600<br>
        No load speed (rpm)<br>
        Impact-driver mode: 0 – 2,300<br>
        Drill mode (Hi / Lo): 0-2,300 / 0-700<br>
        Screwdriver mode: 0 – 2,300<br>
        Max fastening torque<br>
        Impact-driver mode: 140N•m (1,240in.lbs)<br>
        Drill mode (Hard/ Soft): 50/ 10N•m<br>
        Dimensions (L x W x H)<br>
        186 x 79 x 246mm<br>
        Net weight<br>
        1.6kg (3.6lbs)<br>
        Standard Equipment: 1 Phillips Bit.<br>
        Model comes without Battery and Charger
    </p>

这是我需要从头到尾刮掉的 HTML<p>Capacity<br>代码

url = "https://khusheimstore.com/product/makita-cordless-4-mode-impact-driver-for-18vli-ion-dtp140z-dtp140z-220/"
headers = {"Accept-Language": "en-US, en;q=0.5"}
results = requests.get(url, headers=headers)

soup = BeautifulSoup(results.text, "html.parser")

#initiate data storag
Techspec= []


Techspec.append (soup.findAll('div', attrs={"id":"woocommerce-Tabs-panel woocommerce-Tabs-panel--specification panel entry-content wc-tab"}))

标签: pythonweb-scrapingbeautifulsoup

解决方案


您需要的数据存在于第二个<p>标签中。您需要首先选择该标签,然后从中提取数据。

这是它是如何完成的。

import bs4 as bs

s = """
<div class="woocommerce-Tabs-panel woocommerce-Tabs-panel--specification panel entry-content wc-tab" id="tab-specification" role="tabpanel" aria-labelledby="tab-title-specification" style="display: block;">
                <table class="woocommerce-product-attributes shop_attributes">
            <tbody><tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--weight">
            <th class="woocommerce-product-attributes-item__label">Weight</th>
            <td class="woocommerce-product-attributes-item__value">1.6 kg</td>
        </tr>
            <tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--attribute_pa_brands">
            <th class="woocommerce-product-attributes-item__label">brands</th>
            <td class="woocommerce-product-attributes-item__value"><p><a href="https://khusheimstore.com/brands/makita/" rel="tag">MAKITA</a></p>
</td>
        </tr>
    </tbody></table>
<p>Capacity<br>
Steel : 10mm (3/8″)<br>
Wood : 21mm (13/16″)<br>
Masonry : 8mm (5/16″)<br>
Impacts per minute (ipm)<br>
Impact-driver mode: 0 – 3,200<br>
Hammer drill mode: 0 – 27,600<br>
No load speed (rpm)<br>
Impact-driver mode: 0 – 2,300<br>
Drill mode (Hi / Lo): 0-2,300 / 0-700<br>
Screwdriver mode: 0 – 2,300<br>
Max fastening torque<br>
Impact-driver mode: 140Nm (1,240in.lbs)<br>
Drill mode (Hard/ Soft): 50/ 10Nm<br>
Dimensions (L x W x H)<br>
186 x 79 x 246mm<br>
Net weight<br>
1.6kg (3.6lbs)<br>
Standard Equipment: 1 Phillips Bit.<br>
Model comes without Battery and Charger</p>
"""

soup = bs.BeautifulSoup(s, 'lxml')
p = soup.find_all('p')[1]

for i in list(p.stripped_strings):
    print(i.strip())

Capacity
Steel : 10mm (3/8″)
Wood : 21mm (13/16″)
Masonry : 8mm (5/16″)
Impacts per minute (ipm)
Impact-driver mode: 0 – 3,200
Hammer drill mode: 0 – 27,600
No load speed (rpm)
Impact-driver mode: 0 – 2,300
Drill mode (Hi / Lo): 0-2,300 / 0-700
Screwdriver mode: 0 – 2,300
Max fastening torque
Impact-driver mode: 140Nm (1,240in.lbs)
Drill mode (Hard/ Soft): 50/ 10Nm
Dimensions (L x W x H)
186 x 79 x 246mm
Net weight
1.6kg (3.6lbs)
Standard Equipment: 1 Phillips Bit.
Model comes without Battery and Charger

推荐阅读