首页 > 解决方案 > 抓取 HTML 表格。所有类都有相同的名称,但内容不同。如何抓取内容?

问题描述

[<div class="d-row js-search-row js-acc-wrapper" id="stocks-row-1">
   <div class="d-cell js-cell js-acc-activator" data-label="Instrument">0001</div>
   <div class="d-cell hidden js-cell js-target" data-label="Company">CK Hutchison</div>
   <div class="d-cell hidden js-target" data-label="Min traded quantity
      ">30</div>
   <div class="d-cell hidden js-target" data-label="Margin - Retail clients">
      20%                    
   </div>
   <div class="d-cell hidden js-target" data-label="Margin - Professional clients">
      10%                    
   </div>
   <div class="d-cell hidden js-target" data-label="Long position swap">-0.018743</div>
   <div class="d-cell hidden js-target" data-label="Short position swap">-0.009970</div>
   <div class="d-cell hidden js-target" data-label="Market hours *">
      1:30 am to 8:10 am
   </div>
</div>
,
<div class="d-row js-search-row js-acc-wrapper" id="stocks-row-2">
   <div class="d-cell js-cell js-acc-activator" data-label="Instrument">0002</div>
   <div class="d-cell hidden js-cell js-target" data-label="Company">CLP Holdings Ltd.</div>
   <div class="d-cell hidden js-target" data-label="Min traded quantity
      ">25</div>
   <div class="d-cell hidden js-target" data-label="Margin - Retail clients">
      20%                    
   </div>
   <div class="d-cell hidden js-target" data-label="Margin - Professional clients">
      10%                    
   </div>
   <div class="d-cell hidden js-target" data-label="Long position swap">-0.023541</div>
   <div class="d-cell hidden js-target" data-label="Short position swap">-0.012522</div>
   <div class="d-cell hidden js-target" data-label="Market hours *">
      1:30 am to 8:10 am
   </div>
</div>
,
<div class="d-row js-search-row js-acc-wrapper" id="stocks-row-3">
   <div class="d-cell js-cell js-acc-activator" data-label="Instrument">0003</div>
   <div class="d-cell hidden js-cell js-target" data-label="Company">The Hong Kong and China Gas Company Ltd.</div>
   <div class="d-cell hidden js-target" data-label="Min traded quantity
      ">100</div>
   <div class="d-cell hidden js-target" data-label="Margin - Retail clients">
      20%                    
   </div>
   <div class="d-cell hidden js-target" data-label="Margin - Professional clients">
      10%                    
   </div>
   <div class="d-cell hidden js-target" data-label="Long position swap">-0.003874</div>
   <div class="d-cell hidden js-target" data-label="Short position swap">-0.002061</div>
   <div class="d-cell hidden js-target" data-label="Market hours *">
      1:30 am to 8:10 am
   </div>
</div>]

以上是我从一张更大的桌子上刮下来的三行。"data-label"是列名,每个数据标签都有一个值。

原始表看起来像这样Link to Table

在此处输入图像描述

我正在尝试获取每一行的值,但是 div 类对于它们中的大多数是完全相同的。

在上面的示例中,您可以看到大多数单元格都有类d-cell hidden js-target

我可以找到InstrumentCompany数据,因为它们有自己的class. 以下效果很好:

instrument = soup.findAll("div",{'class':'d-cell js-cell js-acc-activator'})
company = soup.findAll("div",{'class':'d-cell hidden js-cell js-target'})

但是其余的数据都共享同一个类,只是在名为 的属性上有所不同data-label

如果我只使用该类,我会将所有数据混合在一起。

soup.findAll("div",{'class':'d-cell hidden js-target'})

这是行不通的。

例如,我如何仅获取Min traded quantity然后仅获取Margin等?

我不知道如何将data-label属性与findAll.

data-label这是我用来获取的尝试,这是此处Min traded quantitystackoverflow答案的一种解决方法

min_traded_quantity = soup.findAll("div",{'class':'d-cell hidden js-target','data-label':"Min traded quantity"})`

结果是一个空列表。

老实说,我不知道谷歌是什么,因为我不知道这data-label是什么东西。我找到的答案有点类似于我的问题,但对我不起作用。它是另一种类型的课程吗?我可以以某种方式引用它Findall吗?

我还删除了classin findall,只使用data-label,这不起作用:

min_traded_quantity = soup.findAll("div",{'data-label':"Min traded quantity"})`

有什么建议吗?

是的,我是美丽汤的新手。

标签: pythonhtmlbeautifulsoup

解决方案


只需抓住代表一行的 div,然后divs在该行中查找所有内容,您就完成了。

就是这样:

import requests
from bs4 import BeautifulSoup
from tabulate import tabulate


headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/87.0.4280.67 Safari/537.36"
}

url = "https://www.trading212.com/en/Trading-Instruments?id=3"
soup = BeautifulSoup(requests.get(url, headers=headers).text, "html.parser")

table = soup.find_all("div", {"class": "d-row js-search-row js-acc-wrapper"})
columns = [
    " ".join(i.getText(strip=True).split()) for i 
    in soup.find("div", {"class": "d-row hidden-heading"})
]
parsed_table = [
    [i.getText(strip=True) for i in row.find_all("div")] for row in table
]

print(tabulate(parsed_table, headers=columns))

输出:

Instrument    Company                                           Min traded quantity  Margin - Retail clients    Margin - Professional clients      Long position swap    Short position swap  Market hours *
------------  ----------------------------------------------  ---------------------  -------------------------  -------------------------------  --------------------  ---------------------  -------------------
0001          CK Hutchison                                                       30  20%                        10%                                         -0.018743              -0.00997   1:30 am to 8:10 am
0002          CLP Holdings Ltd.                                                  25  20%                        10%                                         -0.023541              -0.012522  1:30 am to 8:10 am
0003          The Hong Kong and China Gas Company Ltd.                          100  20%                        10%                                         -0.003874              -0.002061  1:30 am to 8:10 am
0004          The Wharf Ltd.                                                     50  20%                        10%                                         -0.006152              -0.003273  1:30 am to 8:10 am
0011          Hang Seng Bank Ltd.                                                10  20%                        10%                                         -0.044209              -0.023516  1:30 am to 8:10 am
0016          Sun Hung Kai Properties Ltd.                                       15  20%                        10%                                         -0.033863              -0.018012  1:30 am to 8:10 am
0023          Bank of East Asia, Ltd.                                           100  20%                        10%                                         -0.005545              -0.00295   1:30 am to 8:10 am
0066          MTR Corporation Ltd.                                               50  20%                        10%                                         -0.013814              -0.007348  1:30 am to 8:10 am
0175          Geely Automobile Holdings Ltd.                                    150  20%                        10%                                         -0.006766              -0.003599  1:30 am to 8:10 am
0267          CITIC Ltd.                                                        250  20%                        10%                                         -0.002019              -0.001074  1:30 am to 8:10 am
0291          China Resources Beer Company Ltd.                                  50  20%                        10%                                         -0.020097              -0.01069   1:30 am to 8:10 am
0388          Hong Kong Exchanges and Clearing Ltd.                              10  20%                        10%                                         -0.12393               -0.06592   1:30 am to 8:10 am
0390          China Railway Group Ltd.                                            1  20%                        10%                                         -0.001255              -0.000668  1:30 am to 8:10 am
0688          China overseas                                                     50  20%                        10%                                         -0.006303              -0.003352  1:30 am to 8:10 am
0700          Tencent Holdings Ltd                                               10  20%                        10%                                         -0.177066              -0.089409  1:30 am to 8:10 am
0728          China Telecom Corporation Limited                                   1  20%                        10%                                         -0.000769              -0.000409  1:30 am to 8:10 am
0762          China Unicom (Hong Kong) Limited.                                   1  20%                        10%                                         -0.001519              -0.000808  1:30 am to 8:10 am
0857          PetroChina Company Limited.                                         1  20%                        10%                                         -0.000808              -0.00043   1:30 am to 8:10 am
0883          CNOOC Ltd.                                                        150  20%                        10%                                         -0.002516              -0.001339  1:30 am to 8:10 am
0916          China Longyuan Power Group Corporation Limited                      1  20%                        10%                                         -0.001511              -0.000804  1:30 am to 8:10 am
0939          China Construction Bank Corporation                                 1  20%                        10%                                         -0.002019              -0.001074  1:30 am to 8:10 am
1088          China Shenhua Energy Company Ltd.                                 100  20%                        10%                                         -0.004984              -0.002651  1:30 am to 8:10 am
1299          AIA Group Ltd.                                                    100  20%                        20%                                         -0.028828              -0.015334  1:30 am to 8:10 am
1337          Razer Inc.                                                          1  20%                        5%                                          -0.000838              -0.000423  1:30 am to 8:10 am
1810          Xiaomi Corp                                                         1  50%                        50%                                         -0.008597              -0.002866  1:30 am to 8:10 am
1COV          Covestro AG                                                         1  20%                        5%                                          -0.013721              -0.008687  8:00 am to 4:30 pm
21P1          Aurora Cannabis, Inc.                                               1  50%                        50%                                         -0.006535              -0.001592  8:00 am to 4:30 pm
2318          Ping An Insurance Company of China, Ltd.                           25  20%                        10%                                         -0.031015              -0.016497  1:30 am to 8:10 am
2388          BOC Hong Kong Ltd.                                                 10  20%                        5%                                          -0.008086              -0.004301  1:30 am to 8:10 am
2628          China Life Insurance Company Ltd.                                 150  20%                        10%                                         -0.005885              -0.00313   1:30 am to 8:10 am
2914          Japan Tobacco Inc                                                   1  20%                        5%                                          -0.472853              -0.538543  12:00 am to 6:00 am
3328          Bank of Communications Co., Ltd.                                  300  20%                        10%                                         -0.001405              -0.000747  1:30 am to 8:10 am
3333          China Evergrande Group                                              1  20%                        5%                                          -0.004829              -0.001909  1:30 am to 8:10 am
3382          Seven & i Holdings Co., Ltd.                                        1  20%                        5%                                          -0.765264              -0.871578  12:00 am to 6:00 am
3836          China Harmony New Energy Auto Holding Ltd                           1  20%                        5%                                          -0.001337              -0.000711  1:30 am to 8:10 am
3988          Bank Of China Ltd.                                               1000  20%                        10%                                         -0.000893              -0.000475  1:30 am to 8:10 am
4063          Shin-Etsu Chemical Co Ltd                                           5  20%                        20%                                         -3.92842               -4.47417   12:00 am to 6:00 am
4452          Kao Corp                                                           10  20%                        20%                                         -1.73538               -1.97647   12:00 am to 6:00 am
4502          Takeda Pharmaceutical Company Limited                               1  20%                        5%                                          -0.851379              -0.969656  12:00 am to 6:00 am
4503          Astellas Pharma Inc                                                 1  20%                        5%                                          -0.332695              -0.378914  12:00 am to 6:00 am

奖金:

这适用id于该页面上的任何内容。例如,试试这个网址https://www.trading212.com/en/Trading-Instruments?id=1


推荐阅读