首页 > 解决方案 > 网页抓取股价 - 雅虎财经

问题描述

使用我的代码,我可以从 Yahoo Finance 获得实时股票价格。

我的变种。'maks' 定义记录实时数据的秒数。这工作正常,直到 2000 秒(大约是 2000 价格滴答)。

但是,当我定义更长的时间段时——比如 2 小时或更长时间——我收到以下错误:

from bs4 import BeautifulSoup
import ssl
import sys
import time
from urllib.request import Request, urlopen

# For ignoring SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

maks = int(input('Enter time to record data (seconds) : '))

#List for collected values
price_list = []
vol_list = []
time_list = []

print("Parsing data, please wait..")
start = time.time()

i = 0
while i < maks:

 # Making the website believe that you are accessing it using a Mozilla browser
 req = Request('http://finance.yahoo.com/quote/BTC-USD', headers={'User-Agent': 'Mozilla/5.0'})

 web_page = urlopen(req).read()

 # Creating a BeautifulSoup object of the HTML page for easy extraction of data.
 soup = BeautifulSoup(web_page, 'html.parser')
 html = soup.prettify('utf-8')

 new_price = soup.find(id="quote-market-notice").find_parent().find("span").text

 #volume
 vol = soup.find('td', attrs={'data-test': 'TD_VOLUME-value'})
 real_vol = vol.find('span', recursive=False)
 current_vol = real_vol.text.strip()

 saat = time.strftime('%c')

 #Saving values in lists
 price_list.append(new_price)
 vol_list.append(current_vol)
 time_list.append(saat)

 i += 1

错误代码:

File "C:/Users/user/PycharmProjects/untitled/trader.py", line 29, in <module>
    # Creating a BeautifulSoup object of the HTML page for easy extraction of data.

File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 563, in error
    result = self._call_chain(*args)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

urllib.error.HTTPError: HTTP Error 404: Not Found

标签: pythonweb-scrapingbeautifulsoup

解决方案


可能是您达到了其他线程答案中提到的 API 调用限制,即每个 IP 的每小时上限 2,000 个请求/小时,公共:

雅虎金融 API 的查询限制是多少?

更多关于它的信息:

使用信息和限制


推荐阅读