python - 抓取 HTML 表格。所有类都有相同的名称,但内容不同。如何抓取内容?
问题描述
[<div class="d-row js-search-row js-acc-wrapper" id="stocks-row-1">
<div class="d-cell js-cell js-acc-activator" data-label="Instrument">0001</div>
<div class="d-cell hidden js-cell js-target" data-label="Company">CK Hutchison</div>
<div class="d-cell hidden js-target" data-label="Min traded quantity
">30</div>
<div class="d-cell hidden js-target" data-label="Margin - Retail clients">
20%
</div>
<div class="d-cell hidden js-target" data-label="Margin - Professional clients">
10%
</div>
<div class="d-cell hidden js-target" data-label="Long position swap">-0.018743</div>
<div class="d-cell hidden js-target" data-label="Short position swap">-0.009970</div>
<div class="d-cell hidden js-target" data-label="Market hours *">
1:30 am to 8:10 am
</div>
</div>
,
<div class="d-row js-search-row js-acc-wrapper" id="stocks-row-2">
<div class="d-cell js-cell js-acc-activator" data-label="Instrument">0002</div>
<div class="d-cell hidden js-cell js-target" data-label="Company">CLP Holdings Ltd.</div>
<div class="d-cell hidden js-target" data-label="Min traded quantity
">25</div>
<div class="d-cell hidden js-target" data-label="Margin - Retail clients">
20%
</div>
<div class="d-cell hidden js-target" data-label="Margin - Professional clients">
10%
</div>
<div class="d-cell hidden js-target" data-label="Long position swap">-0.023541</div>
<div class="d-cell hidden js-target" data-label="Short position swap">-0.012522</div>
<div class="d-cell hidden js-target" data-label="Market hours *">
1:30 am to 8:10 am
</div>
</div>
,
<div class="d-row js-search-row js-acc-wrapper" id="stocks-row-3">
<div class="d-cell js-cell js-acc-activator" data-label="Instrument">0003</div>
<div class="d-cell hidden js-cell js-target" data-label="Company">The Hong Kong and China Gas Company Ltd.</div>
<div class="d-cell hidden js-target" data-label="Min traded quantity
">100</div>
<div class="d-cell hidden js-target" data-label="Margin - Retail clients">
20%
</div>
<div class="d-cell hidden js-target" data-label="Margin - Professional clients">
10%
</div>
<div class="d-cell hidden js-target" data-label="Long position swap">-0.003874</div>
<div class="d-cell hidden js-target" data-label="Short position swap">-0.002061</div>
<div class="d-cell hidden js-target" data-label="Market hours *">
1:30 am to 8:10 am
</div>
</div>]
以上是我从一张更大的桌子上刮下来的三行。"data-label"
是列名,每个数据标签都有一个值。
原始表看起来像这样Link to Table
我正在尝试获取每一行的值,但是 div 类对于它们中的大多数是完全相同的。
在上面的示例中,您可以看到大多数单元格都有类d-cell hidden js-target
。
我可以找到Instrument
和Company
数据,因为它们有自己的class
. 以下效果很好:
instrument = soup.findAll("div",{'class':'d-cell js-cell js-acc-activator'})
company = soup.findAll("div",{'class':'d-cell hidden js-cell js-target'})
但是其余的数据都共享同一个类,只是在名为 的属性上有所不同data-label
。
如果我只使用该类,我会将所有数据混合在一起。
soup.findAll("div",{'class':'d-cell hidden js-target'})
这是行不通的。
例如,我如何仅获取Min traded quantity
然后仅获取Margin
等?
我不知道如何将data-label
属性与findAll
.
data-label
这是我用来获取的尝试,这是此处Min traded quantity
stackoverflow答案的一种解决方法
min_traded_quantity = soup.findAll("div",{'class':'d-cell hidden js-target','data-label':"Min traded quantity"})`
结果是一个空列表。
老实说,我不知道谷歌是什么,因为我不知道这data-label
是什么东西。我找到的答案有点类似于我的问题,但对我不起作用。它是另一种类型的课程吗?我可以以某种方式引用它Findall
吗?
我还删除了class
in findall,只使用data-label
,这不起作用:
min_traded_quantity = soup.findAll("div",{'data-label':"Min traded quantity"})`
有什么建议吗?
是的,我是美丽汤的新手。
解决方案
只需抓住代表一行的 div,然后divs
在该行中查找所有内容,您就完成了。
就是这样:
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/87.0.4280.67 Safari/537.36"
}
url = "https://www.trading212.com/en/Trading-Instruments?id=3"
soup = BeautifulSoup(requests.get(url, headers=headers).text, "html.parser")
table = soup.find_all("div", {"class": "d-row js-search-row js-acc-wrapper"})
columns = [
" ".join(i.getText(strip=True).split()) for i
in soup.find("div", {"class": "d-row hidden-heading"})
]
parsed_table = [
[i.getText(strip=True) for i in row.find_all("div")] for row in table
]
print(tabulate(parsed_table, headers=columns))
输出:
Instrument Company Min traded quantity Margin - Retail clients Margin - Professional clients Long position swap Short position swap Market hours *
------------ ---------------------------------------------- --------------------- ------------------------- ------------------------------- -------------------- --------------------- -------------------
0001 CK Hutchison 30 20% 10% -0.018743 -0.00997 1:30 am to 8:10 am
0002 CLP Holdings Ltd. 25 20% 10% -0.023541 -0.012522 1:30 am to 8:10 am
0003 The Hong Kong and China Gas Company Ltd. 100 20% 10% -0.003874 -0.002061 1:30 am to 8:10 am
0004 The Wharf Ltd. 50 20% 10% -0.006152 -0.003273 1:30 am to 8:10 am
0011 Hang Seng Bank Ltd. 10 20% 10% -0.044209 -0.023516 1:30 am to 8:10 am
0016 Sun Hung Kai Properties Ltd. 15 20% 10% -0.033863 -0.018012 1:30 am to 8:10 am
0023 Bank of East Asia, Ltd. 100 20% 10% -0.005545 -0.00295 1:30 am to 8:10 am
0066 MTR Corporation Ltd. 50 20% 10% -0.013814 -0.007348 1:30 am to 8:10 am
0175 Geely Automobile Holdings Ltd. 150 20% 10% -0.006766 -0.003599 1:30 am to 8:10 am
0267 CITIC Ltd. 250 20% 10% -0.002019 -0.001074 1:30 am to 8:10 am
0291 China Resources Beer Company Ltd. 50 20% 10% -0.020097 -0.01069 1:30 am to 8:10 am
0388 Hong Kong Exchanges and Clearing Ltd. 10 20% 10% -0.12393 -0.06592 1:30 am to 8:10 am
0390 China Railway Group Ltd. 1 20% 10% -0.001255 -0.000668 1:30 am to 8:10 am
0688 China overseas 50 20% 10% -0.006303 -0.003352 1:30 am to 8:10 am
0700 Tencent Holdings Ltd 10 20% 10% -0.177066 -0.089409 1:30 am to 8:10 am
0728 China Telecom Corporation Limited 1 20% 10% -0.000769 -0.000409 1:30 am to 8:10 am
0762 China Unicom (Hong Kong) Limited. 1 20% 10% -0.001519 -0.000808 1:30 am to 8:10 am
0857 PetroChina Company Limited. 1 20% 10% -0.000808 -0.00043 1:30 am to 8:10 am
0883 CNOOC Ltd. 150 20% 10% -0.002516 -0.001339 1:30 am to 8:10 am
0916 China Longyuan Power Group Corporation Limited 1 20% 10% -0.001511 -0.000804 1:30 am to 8:10 am
0939 China Construction Bank Corporation 1 20% 10% -0.002019 -0.001074 1:30 am to 8:10 am
1088 China Shenhua Energy Company Ltd. 100 20% 10% -0.004984 -0.002651 1:30 am to 8:10 am
1299 AIA Group Ltd. 100 20% 20% -0.028828 -0.015334 1:30 am to 8:10 am
1337 Razer Inc. 1 20% 5% -0.000838 -0.000423 1:30 am to 8:10 am
1810 Xiaomi Corp 1 50% 50% -0.008597 -0.002866 1:30 am to 8:10 am
1COV Covestro AG 1 20% 5% -0.013721 -0.008687 8:00 am to 4:30 pm
21P1 Aurora Cannabis, Inc. 1 50% 50% -0.006535 -0.001592 8:00 am to 4:30 pm
2318 Ping An Insurance Company of China, Ltd. 25 20% 10% -0.031015 -0.016497 1:30 am to 8:10 am
2388 BOC Hong Kong Ltd. 10 20% 5% -0.008086 -0.004301 1:30 am to 8:10 am
2628 China Life Insurance Company Ltd. 150 20% 10% -0.005885 -0.00313 1:30 am to 8:10 am
2914 Japan Tobacco Inc 1 20% 5% -0.472853 -0.538543 12:00 am to 6:00 am
3328 Bank of Communications Co., Ltd. 300 20% 10% -0.001405 -0.000747 1:30 am to 8:10 am
3333 China Evergrande Group 1 20% 5% -0.004829 -0.001909 1:30 am to 8:10 am
3382 Seven & i Holdings Co., Ltd. 1 20% 5% -0.765264 -0.871578 12:00 am to 6:00 am
3836 China Harmony New Energy Auto Holding Ltd 1 20% 5% -0.001337 -0.000711 1:30 am to 8:10 am
3988 Bank Of China Ltd. 1000 20% 10% -0.000893 -0.000475 1:30 am to 8:10 am
4063 Shin-Etsu Chemical Co Ltd 5 20% 20% -3.92842 -4.47417 12:00 am to 6:00 am
4452 Kao Corp 10 20% 20% -1.73538 -1.97647 12:00 am to 6:00 am
4502 Takeda Pharmaceutical Company Limited 1 20% 5% -0.851379 -0.969656 12:00 am to 6:00 am
4503 Astellas Pharma Inc 1 20% 5% -0.332695 -0.378914 12:00 am to 6:00 am
奖金:
这适用
id
于该页面上的任何内容。例如,试试这个网址https://www.trading212.com/en/Trading-Instruments?id=1
推荐阅读
- python - 如何以编程方式检测数据系列的片段以适应 python 中的不同曲线?
- c# - 在从谷歌图书 API 下载之前确定电子书的大小
- javascript - 在javascript中使用属性更改输入值
- android - 我无法在 Nougat 中录制传入的声音
- tomcat - 我在netbeans IDE中添加tomcat服务器但发生错误
- macos - VS Code 主题未应用于集成终端 VS Code
- java - 如何在xamarin android中实现circlemenu的界面?
- git - Git推送单个提交不起作用
- javascript - 如何在渲染中设置状态反应本机
- augmented-reality - 在android studio模块中使用sceneform依赖