python - 将 pandas DataFrame 读取到 Stocker 时出现问题
问题描述
我最近开始研究一个使用 Stocker 的项目(一个从 fbprophet 运行的 API,用于对股票数据进行机器学习)。我喜欢 API 的简单性,但它有一个致命的缺陷。它使用 quandl 接收其股票数据。Quandl 在 2018 年的某个时候停止更新他们的数据,当您使用旧数据时,不可能运行准确的数据模型。我查看了 Stocker 代码,据我所知,它仅将 quandl 用于一行,即
stock = quandl.get('%s/%s' % (exchange, ticker))
quandl 中的这一行将股票数据作为 pandas 数据框返回。我想既然这就是所有 quandl 的用途,我可以编写我自己的 quandl 类型,从不同的源 (IEX) 获取数据并将其作为 DataFrame 返回。我编写了代码(附在下面),但在储料器中创建模型时不断收到此错误:
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Date'
我对这个很迷茫,对熊猫不太熟悉。任何帮助深表感谢!
Stocker 的相关部分,显示使用 quandly 获取库存数据
# Quandl for financial analysis, pandas and numpy for data manipulation
# fbprophet for additive models, #pytrends for Google trend data
#import quandl
import stockdata
import pandas as pd
import numpy as np
import fbprophet
import pytrends
from pytrends.request import TrendReq
# matplotlib pyplot for plotting
import matplotlib.pyplot as plt
import matplotlib
# Class for analyzing and (attempting) to predict future prices
# Contains a number of visualizations and analysis methods
class Stocker():
# Initialization requires a ticker symbol
def __init__(self, ticker, exchange='IEX'):
# Enforce capitalization
ticker = ticker.upper()
# Symbol is used for labeling plots
self.symbol = ticker
# Use Personal Api Key
# quandl.ApiConfig.api_key = 'YourKeyHere'
# Retrieval the financial data
try:
stock = stockdata.get(ticker)
print(stock)
except Exception as e:
print('Error Retrieving Data.')
print(e)
return
# Set the index to a column called Date
stock = stock.reset_index(level=0)
# Columns required for prophet
stock['ds'] = stock['Date']
if ('Adj. Close' not in stock.columns):
stock['Adj. Close'] = stock['Close']
stock['Adj. Open'] = stock['Open']
stock['y'] = stock['Adj. Close']
stock['Daily Change'] = stock['Adj. Close'] - stock['Adj. Open']
# Data assigned as class attribute
self.stock = stock.copy()
# Minimum and maximum date in range
self.min_date = min(stock['Date'])
self.max_date = max(stock['Date'])
# Find max and min prices and dates on which they occurred
self.max_price = np.max(self.stock['y'])
self.min_price = np.min(self.stock['y'])
self.min_price_date = self.stock[self.stock['y'] == self.min_price]['Date']
self.min_price_date = self.min_price_date[self.min_price_date.index[0]]
self.max_price_date = self.stock[self.stock['y'] == self.max_price]['Date']
self.max_price_date = self.max_price_date[self.max_price_date.index[0]]
# The starting price (starting with the opening price)
self.starting_price = float(self.stock.ix[0, 'Adj. Open'])
# The most recent price
self.most_recent_price = float(self.stock.ix[len(self.stock) - 1, 'y'])
# Whether or not to round dates
self.round_dates = True
# Number of years of data to train on
self.training_years = 3
# Prophet parameters
# Default prior from library
self.changepoint_prior_scale = 0.05
self.weekly_seasonality = False
self.daily_seasonality = False
self.monthly_seasonality = True
self.yearly_seasonality = True
self.changepoints = None
print('{} Stocker Initialized. Data covers {} to {}.'.format(self.symbol,
self.min_date.date(),
self.max_date.date()))
Quandl的get函数
def get(dataset, **kwargs):
"""Return dataframe of requested dataset from Quandl.
:param dataset: str or list, depending on single dataset usage or multiset usage
Dataset codes are available on the Quandl website
:param str api_key: Downloads are limited to 50 unless api_key is specified
:param str start_date, end_date: Optional datefilers, otherwise entire
dataset is returned
:param str collapse: Options are daily, weekly, monthly, quarterly, annual
:param str transform: options are diff, rdiff, cumul, and normalize
:param int rows: Number of rows which will be returned
:param str order: options are asc, desc. Default: `asc`
:param str returns: specify what format you wish your dataset returned as,
either `numpy` for a numpy ndarray or `pandas`. Default: `pandas`
:returns: :class:`pandas.DataFrame` or :class:`numpy.ndarray`
Note that Pandas expects timeseries data to be sorted ascending for most
timeseries functionality to work.
Any other `kwargs` passed to `get` are sent as field/value params to Quandl
with no interference.
"""
_convert_params_to_v3(kwargs)
data_format = kwargs.pop('returns', 'pandas')
ApiKeyUtil.init_api_key_from_args(kwargs)
# Check whether dataset is given as a string
# (for a single dataset) or an array (for a multiset call)
# Unicode String
if isinstance(dataset, string_types):
dataset_args = _parse_dataset_code(dataset)
if dataset_args['column_index'] is not None:
kwargs.update({'column_index': dataset_args['column_index']})
data = Dataset(dataset_args['code']).data(params=kwargs, handle_column_not_found=True)
# Array
elif isinstance(dataset, list):
args = _build_merged_dataset_args(dataset)
# handle_not_found_error if set to True will add an empty DataFrame
# for a non-existent dataset instead of raising an error
data = MergedDataset(args).data(params=kwargs,
handle_not_found_error=True,
handle_column_not_found=True)
# If wrong format
else:
raise InvalidRequestError(Message.ERROR_DATASET_FORMAT)
if data_format == 'numpy':
return data.to_numpy()
return data.to_pandas()
def _parse_dataset_code(dataset):
if '.' not in dataset:
return {'code': dataset, 'column_index': None}
dataset_temp = dataset.split('.')
if not dataset_temp[1].isdigit():
raise ValueError(Message.ERROR_COLUMN_INDEX_TYPE % dataset)
return {'code': dataset_temp[0], 'column_index': int(dataset_temp[1])}
我的贫民区获得功能
import pandas_datareader.data as web
from datetime import date, timedelta
start = date.today()-timedelta(days=1080)
end = date.today()
def get(ticker):
df = web.DataReader(name=ticker.upper(), data_source='iex', start=start, end=end)
return df
解决方案
问题取决于 Quandl 和 IEX 返回的列和列名。
Quandl 回报:
Date Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume
而 IEX 返回:
date open high low close volume
IEX 返回调整后的价格,例如,您可以将 IEX 'close' 列映射到 Quandl 'Adj. 关'
因此,如果您想使用 Stocker 格式(Quandl 格式),您可以创建所需的列,如下所示:
# >-Quandl format-< >-- IEX --<
stock['Adj. Close'] = stock['close']
stock['Date'] = stock ['date']
etc...
请注意,您可能需要将字符串日期从 IEX 转换为日期时间格式
推荐阅读
- python - 如何使用不同的数据框做一个双扣来完成缺失的信息
- python - 字典到 Excel
- python - Python pandas 绘制多条线
- loops - 用 sneg 包在数学中做一个循环
- matlab - 将符号矩阵转换为双精度
- amazon-web-services - 如何在 CloudFormation 模板中使用 SSM StringList 中的值?
- javascript - Javascript:从客户端调用 firebase 函数不起作用
- python-3.x - 使用函数式 API 实现广泛和深度的神经网络
- linux - 无法由特定的 linux 用户部署到 GKE
- node.js - Facebook 营销 API:如何获取优化事件