python - 使用python ps4从网页中抓取所有表格
问题描述
我想使用 beautifulsoup 来获取此链接上的所有表格https://www.investing.com/indices/indices-futures
,然后我想获取索引列中的标题以及这些标题的链接。
我只想要第一列中的内容。
举个例子。
title href
Dow Jones /indices/us-30-futures
S&P 500 /indices/us-spx-500-futures
...
Mini DAX /indices/mini-dax-futures
...
VSTOXX Mini /indices/vstoxx-mini
我使用以下代码
url = "https://www.investing.com/indices/indices-futures"
req = requests.get(url, headers=urlheader)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('div', id="cross_rates_container")
for a in table.find_all('a', href=True):
print (a['title'], a['href'])
我可以看到表变量,但似乎无法访问标题(包含索引名称)和 href(包含链接)
它有什么问题我怎样才能一次获得所有表格的条目?
解决方案
您可以遍历<td>
元素并获取<a>
它们下方的链接。
例如:
import requests
from bs4 import BeautifulSoup
url = 'https://www.investing.com/indices/indices-futures'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
print('{:<30} {}'.format('Title', 'URL'))
for a in soup.select('td.plusIconTd > a'):
print('{:<30} {}'.format(a.text, 'https://www.investing.com' + a['href']))
印刷:
Title URL
Dow Jones https://www.investing.com/indices/us-30-futures
S&P 500 https://www.investing.com/indices/us-spx-500-futures
Nasdaq https://www.investing.com/indices/nq-100-futures
SmallCap 2000 https://www.investing.com/indices/smallcap-2000-futures
S&P 500 VIX https://www.investing.com/indices/us-spx-vix-futures
DAX https://www.investing.com/indices/germany-30-futures
CAC 40 https://www.investing.com/indices/france-40-futures
FTSE 100 https://www.investing.com/indices/uk-100-futures
Euro Stoxx 50 https://www.investing.com/indices/eu-stocks-50-futures
FTSE MIB https://www.investing.com/indices/italy-40-futures
SMI https://www.investing.com/indices/switzerland-20-futures
IBEX 35 https://www.investing.com/indices/spain-35-futures
ATX https://www.investing.com/indices/austria-20-futures
WIG20 https://www.investing.com/indices/poland-20-futures
AEX https://www.investing.com/indices/netherlands-25-futures
BUX https://www.investing.com/indices/hungary-14-futures
RTS https://www.investing.com/indices/rts-cash-settled-futures
... and so on.
编辑:带有<td>
元素的屏幕截图:
推荐阅读
- r - 如何使用不等的样本量 R 运行 ANOVA/多因素回归
- javascript - 使用 Jimp Composite 重叠两个图像
- flutter - 如何禁用 PageView 边缘的动画?
- python - Popen 在错误的时间执行
- javascript - Set() 作为 Javascript 对象的键
- javascript - 如何从对象中删除“未定义”值
- java - 瓷砖从魔方旋转消失
- join - 合并每列两行的值:CONCAT 还是 SPLIT/JOIN?
- go - golang.org/x/crypto/bcrypt 生成哈希时的错误情况
- python - 来自文本文件的 8 个值的中位数