python - 使用 Python 和 BeautifulSoup 提取 CME 数据
问题描述
我想从下面的 CME 网站中提取一个包含该符号相关价格的符号列表。我能够得到一个符号列表,但我无法弄清楚如何拉动每一行的价格。
在浏览器上使用“检查”时遇到问题,要查询的标签不是“跨度”。让我解决这个问题的想法?
代码:
import urllib
from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
def simple_get(url):
"""
Attempts to get the content at `url` by making an HTTP GET request.
If the content-type of response is some kind of HTML/XML, return the
text content, otherwise return None.
"""
try:
with closing(get(url, stream=True)) as resp:
if is_good_response(resp):
return resp.content
else:
return None
except RequestException as e:
log_error('Error during requests to {0} : {1}'.format(url, str(e)))
return None
def is_good_response(resp):
"""
Returns True if the response seems to be HTML, False otherwise.
"""
content_type = resp.headers['Content-Type'].lower()
return (resp.status_code == 200
and content_type is not None
and content_type.find('html') > -1)
def log_error(e):
print(e)
raw_html = simple_get('https://www.cmegroup.com/trading/price-limits.html#equityIndex')
html = BeautifulSoup(raw_html, 'html.parser', store_line_numbers=True)
seq = ['ESM0', 'NQM0', 'RTYM0', 'YMM0']
for quote in html.find_all('span'):
symbolcme = quote.get_text(strip=True)
#print("Check Symbol: ", symbolcme)
for text in seq:
if text in symbolcme:
print(quote.sourceline, ' Symbol:', symbolcme)
结果:
2014 Symbol: E-mini S&P 500 Futures (ESM0)
2047 Symbol: E-mini Nasdaq-100 Futures (NQM0)
2065 Symbol: E-mini Dow ($5) Futures (YMM0)
2392 Symbol: E-mini Russell 2000 Index Futures (RTYM0)
2500 Symbol: Micro E-mini Dow Jones Industrial Average Index Futures (MYMM0)
2515 Symbol: Micro E-mini Nasdaq-100 Index Futures (MNQM0)
2551 Symbol: Micro E-mini S&P 500 Index Futures (MESM0)
解决方案
BeautifulSoup在这里可以很好tag.parent
地tag.next_siblings
协同工作,假设您希望价格与打印的报价在同一行中。
for quote in html.find_all('span'):
symbolcme = quote.get_text(strip=True)
for text in seq:
if text in symbolcme:
print(quote.sourceline, ' Symbol:', symbolcme)
prices = [sibling.get_text() for sibling in quote.parent.next_siblings]
print(prices)
输出:
2015 Symbol: E-mini S&P 500 Futures (ESM0)
['240000', '252000 / 228000', '223150', '208700', '191850']
2048 Symbol: E-mini Nasdaq-100 Futures (NQM0)
['726475', '762900 / 690050', '675475', '631725', '580725']
2066 Symbol: E-mini Dow ($5) Futures (YMM0)
['19951', '20955 / 18947', '18545', '17340', '15934']
2393 Symbol: E-mini Russell 2000 Index Futures (RTYM0)
['104070', '109360 / 98780', '96660', '90310', '82900']
推荐阅读
- linux - 电子托盘图标在 Ubuntu 20.04 上不起作用
- osmnx - OSMnx 中街道网络的日期
- flutter - 更改 Flutter 本地通知图标背景颜色
- python - Selenium 获取最后一个子元素
- javascript - 将脚本名称从 SQL 导入到 javascript
- python - 使用来自链接的scrapy递归构建站点地图
- date - 条件格式似乎不适用于我的日期
- spring-hateoas - 我们没有通用方法来创建 rel 链接 Spring hatoas 1.1.x 中的所有控制器端点
- python - 「python tkinter entryBOX」如何在输入框输入光标之前获取一个字符串?
- reactjs - 相邻的 JSX 元素必须包含在封闭标记中。你想要一个 JSX 片段 -ReactJS