首页 > 解决方案 > 使用python ps4从网页中抓取所有表格

问题描述

我想使用 beautifulsoup 来获取此链接上的所有表格https://www.investing.com/indices/indices-futures,然后我想获取索引列中的标题以及这些标题的链接。

我只想要第一列中的内容。

举个例子。

title        href
Dow Jones    /indices/us-30-futures
S&P 500      /indices/us-spx-500-futures
...
Mini DAX     /indices/mini-dax-futures
...
VSTOXX Mini  /indices/vstoxx-mini 


我使用以下代码

url = "https://www.investing.com/indices/indices-futures"
req = requests.get(url, headers=urlheader)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('div', id="cross_rates_container")
for a in table.find_all('a', href=True):
    print (a['title'], a['href'])

我可以看到表变量,但似乎无法访问标题(包含索引名称)和 href(包含链接)

它有什么问题我怎样才能一次获得所有表格的条目?

标签: pythonweb-scrapingbeautifulsoupdatatables

解决方案


您可以遍历<td>元素并获取<a>它们下方的链接。

例如:

import requests
from bs4 import BeautifulSoup


url = 'https://www.investing.com/indices/indices-futures'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

print('{:<30} {}'.format('Title', 'URL'))
for a in soup.select('td.plusIconTd > a'):
    print('{:<30} {}'.format(a.text, 'https://www.investing.com' + a['href']))

印刷:

Title                          URL
Dow Jones                      https://www.investing.com/indices/us-30-futures
S&P 500                        https://www.investing.com/indices/us-spx-500-futures
Nasdaq                         https://www.investing.com/indices/nq-100-futures
SmallCap 2000                  https://www.investing.com/indices/smallcap-2000-futures
S&P 500 VIX                    https://www.investing.com/indices/us-spx-vix-futures
DAX                            https://www.investing.com/indices/germany-30-futures
CAC 40                         https://www.investing.com/indices/france-40-futures
FTSE 100                       https://www.investing.com/indices/uk-100-futures
Euro Stoxx 50                  https://www.investing.com/indices/eu-stocks-50-futures
FTSE MIB                       https://www.investing.com/indices/italy-40-futures
SMI                            https://www.investing.com/indices/switzerland-20-futures
IBEX 35                        https://www.investing.com/indices/spain-35-futures
ATX                            https://www.investing.com/indices/austria-20-futures
WIG20                          https://www.investing.com/indices/poland-20-futures
AEX                            https://www.investing.com/indices/netherlands-25-futures
BUX                            https://www.investing.com/indices/hungary-14-futures
RTS                            https://www.investing.com/indices/rts-cash-settled-futures

... and so on.

编辑:带有<td>元素的屏幕截图:

在此处输入图像描述


推荐阅读