python - 网页抓取股价 - 雅虎财经
问题描述
使用我的代码,我可以从 Yahoo Finance 获得实时股票价格。
我的变种。'maks' 定义记录实时数据的秒数。这工作正常,直到 2000 秒(大约是 2000 价格滴答)。
但是,当我定义更长的时间段时——比如 2 小时或更长时间——我收到以下错误:
from bs4 import BeautifulSoup
import ssl
import sys
import time
from urllib.request import Request, urlopen
# For ignoring SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
maks = int(input('Enter time to record data (seconds) : '))
#List for collected values
price_list = []
vol_list = []
time_list = []
print("Parsing data, please wait..")
start = time.time()
i = 0
while i < maks:
# Making the website believe that you are accessing it using a Mozilla browser
req = Request('http://finance.yahoo.com/quote/BTC-USD', headers={'User-Agent': 'Mozilla/5.0'})
web_page = urlopen(req).read()
# Creating a BeautifulSoup object of the HTML page for easy extraction of data.
soup = BeautifulSoup(web_page, 'html.parser')
html = soup.prettify('utf-8')
new_price = soup.find(id="quote-market-notice").find_parent().find("span").text
#volume
vol = soup.find('td', attrs={'data-test': 'TD_VOLUME-value'})
real_vol = vol.find('span', recursive=False)
current_vol = real_vol.text.strip()
saat = time.strftime('%c')
#Saving values in lists
price_list.append(new_price)
vol_list.append(current_vol)
time_list.append(saat)
i += 1
错误代码:
File "C:/Users/user/PycharmProjects/untitled/trader.py", line 29, in <module>
# Creating a BeautifulSoup object of the HTML page for easy extraction of data.
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 563, in error
result = self._call_chain(*args)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 755, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
解决方案
推荐阅读
- amazon-web-services - 如何使用 AWS go SDK 检查是否为 ECS 服务启用了自动扩展?
- git - 如果我尝试 rebase 已经推送到开发分支的提交会发生什么
- python-3.7 - 我是初学者。请帮助解决这个问题。以下代码会导致无限循环。你能弄清楚缺少什么以及如何修复它吗?
- mysql - SELECT COUNT(id) 运行速度比 MySQL 中的其他计数慢
- pointers - Fortran:指向不同数组的数组指针
- python-3.x - 在 staticmethod 中使用 classmethod 时缺少 1 个必需的位置参数:“self”
- python - 带有 .conf 的 Python 日志记录输出格式
- http - 通过http颤振上传图片时是否可以显示进度条?
- python - python workon命令不会激活cmd windows 7中的虚拟环境
- javascript - 包装旋转木马 indeces