python - 如何用python更快地计算指数移动平均线?
问题描述
我正忙于构建回测软件,但在创建指数移动平均线时遇到了麻烦。我成功地使用 for 循环创建了它,但每个我想测试的符号运行大约需要 20 秒(太长)。
如果有人有任何建议,我正在尝试找到更快的解决方案。
我当前的代码看起来像这样,但它不会产生正确的结果。
def exponential_moving_average(df, period):
# Create a copy of original dataframe to work with.
dataframe = df.copy()
dataframe['EMA'] = dataframe['Close'].ewm( span = period,
adjust = False,
min_periods = period,
ignore_na = True
).mean()
return dataframe['EMA']
此方法在指标类中,输入采用以下内容。
df
是每天的Open, High, Low
价格Close
以及将用于回测的任何其他指标period
是必须计算指数移动平均线的“窗口”或天数。
这是一个包含值的片段df
:
symbol Open High Low Close ATR slow_ma
Date
2010-01-03 EURUSD 1.43075 1.43369 1.43065 1.43247 NaN NaN
2010-01-04 EURUSD 1.43020 1.44560 1.42570 1.44120 NaN NaN
2010-01-05 EURUSD 1.44130 1.44840 1.43460 1.43650 NaN NaN
2010-01-06 EURUSD 1.43660 1.44350 1.42820 1.44060 NaN NaN
2010-01-07 EURUSD 1.44070 1.44470 1.42990 1.43070 NaN NaN
2010-01-08 EURUSD 1.43080 1.44380 1.42630 1.44160 NaN NaN
2010-01-10 EURUSD 1.44245 1.44252 1.44074 1.44110 NaN NaN
2010-01-11 EURUSD 1.44280 1.45560 1.44080 1.45120 NaN NaN
2010-01-12 EURUSD 1.45120 1.45450 1.44530 1.44840 NaN NaN
2010-01-13 EURUSD 1.44850 1.45790 1.44570 1.45100 NaN 1.442916
2010-01-14 EURUSD 1.45090 1.45550 1.44460 1.44990 NaN 1.444186
2010-01-15 EURUSD 1.45000 1.45110 1.43360 1.43790 NaN 1.443043
2010-01-17 EURUSD 1.43597 1.43655 1.43445 1.43480 NaN 1.441544
2010-01-18 EURUSD 1.43550 1.44000 1.43340 1.43830 NaN 1.440954
2010-01-19 EURUSD 1.43820 1.44130 1.42520 1.42870 NaN 1.438726
这是slow_ma
(10天)的预期结果
symbol Open High Low Close ATR slow_ma
Date
2010-01-03 EURUSD 1.43075 1.43369 1.43065 1.43247 NaN NaN
2010-01-04 EURUSD 1.43020 1.44560 1.42570 1.44120 NaN NaN
2010-01-05 EURUSD 1.44130 1.44840 1.43460 1.43650 NaN NaN
2010-01-06 EURUSD 1.43660 1.44350 1.42820 1.44060 NaN NaN
2010-01-07 EURUSD 1.44070 1.44470 1.42990 1.43070 NaN NaN
2010-01-08 EURUSD 1.43080 1.44380 1.42630 1.44160 NaN NaN
2010-01-10 EURUSD 1.44245 1.44252 1.44074 1.44110 NaN NaN
2010-01-11 EURUSD 1.44280 1.45560 1.44080 1.45120 NaN NaN
2010-01-12 EURUSD 1.45120 1.45450 1.44530 1.44840 NaN NaN
2010-01-13 EURUSD 1.44850 1.45790 1.44570 1.45100 NaN 1.44351
2010-01-14 EURUSD 1.45090 1.45550 1.44460 1.44990 NaN 1.44467
2010-01-15 EURUSD 1.45000 1.45110 1.43360 1.43790 NaN 1.44344
2010-01-17 EURUSD 1.43597 1.43655 1.43445 1.43480 NaN 1.44187
2010-01-18 EURUSD 1.43550 1.44000 1.43340 1.43830 NaN 1.44122
2010-01-19 EURUSD 1.43820 1.44130 1.42520 1.42870 NaN 1.43894
我已经更改了第一个数据帧的值,以便它显示用于计算slow_ma
.
这是我在 Stackoverflow 上的第一篇文章,所以请问是否有不清楚的地方。
解决方案
How to calculate an exponential moving average with python faster ?
speeds under < 50 [us]
for your sized data/period on an old 2.6 [GHz] i5 device achievable...
Step 0: Get the results ( the process ) pass the Quality Assurance
Having fast but wrong data has negative value added, right?
Given you are using a "hardwired" .ewm()
method, you can but re-read it's parametrisation options, if different dataframe['Close']
column-processing modes are possible.
As a fast check:
aPV = [ 1.43247, # borrowed from dataframe['Close']
1.44120,
1.43650,
1.44060, 1.43070, 1.44160, 1.44110, 1.45120, 1.44840,
1.45100, 1.44990, 1.43790, 1.43480, 1.43830, 1.42870,
]
|>>> QuantFX.numba_EMA_fromPrice2( N_period = 10,
aPriceVECTOR = QuantFX.np.array( aPV )
)
array([
1.43247 ,
1.43405727,
1.4345014 ,
1.43561024,
1.43471747,
1.43596884,
1.43690178,
1.43950145,
1.44111937,
1.44291585,
1.44418569,
1.44304284,
1.44154414,
1.4409543 ,
1.43872624
]
)
for which there are some ~ +/- 3E-7
numerical-representation differences from values in the first Table above ( i.e. 2 orders below the LSD ).
|>>> ( QuantFX.numba_EMA_fromPrice2( 10,
QuantFX.np.array( aPV )
)
- QuantFX.np.array( slow_EMA_1 )# values borrowed from Table 1 above
)
array([ nan,
nan,
nan,
nan,
nan,
nan,
nan,
nan,
nan,
-1.50656152e-07,
-3.05082306e-07,
-1.58703705e-07,
1.42878787e-07,
2.98719007e-07,
2.44406460e-07
]
)
Step 1: Tweak the ( QA-confirmed ) processing for better speed
During this phase, a lot depends on outer context of use.
Best results could be expected from cythonize()
, yet profiling may show some surprises on the fly.
Without moving the processing into the cython-code, one can get interesting speedups on global use of float64
-s instead of float32
-s ( got shaved off some 110 ~ 200 [us]
on similar EMA depths ), vectorised inplace assignments ( ~ 2x speedup, from ~ 100 [us]
to ~ 50 [us]
in better combined vector-memory allocation of resulting vector and its vectorised value processing ) and best, if mathematical re-formulation can help skip some just "mechanical" operations at all.
Yet, all the speedup tricks depend on used tools - if a pure numpy
, or numpy + numba
( which may yield negative effects on as trivial processing as EMA out of question is - having not much mathematical " meat for Dr.Jackson" to actually number-crunch ) or cython
-optimised solution, so profiling in the target CPU-context is a must, if best results are to get delivered.
trying to find a faster solution ...
Would be interesting to update your post with a statement of what is your expected target speedup, or better a target per-call processing cost in a [TIME]
-domain for the stated problem, on a given [SPACE]
-domain scale of data ( window == 10
, aPriceVECTOR.shape[0] ~ 15
), and if a target code-execution platform has some hardware / CPU / cache-hierarchy composition constraints or not, because building a backtester-platform actually massively emphasises any and all code-design + code-execution inefficiencies.
Given the EMA is reasonably efficient, tools may get ~ 4x speedup
The QuantFX
story has gone from ~ 42000 [us]
down to ~ 21000[us]
without numba
/JIT tools just by re-formulated and memory-optimised vector processing ( Using an artificial sized workload payloads, processing a block of aPV[:10000]
).
Next, the run-time went yet down, to ~ 10600 [us]
, using the as-is Cpython code-base, just with a permission to auto-Cythonise, where possible, an import
-ed code using pyximport
:
pass; import pyximport
pass; pyximport.install( pyimport = True )
from QuantFX import numba_EMA_fromPrice2
...
So, can get speeds
~ 45 ~ 47 [us]
for your sized data aPV[:15]
, period = 10 on an ordinary 2.6 [GHz] i5 device.
If insisting on using pandas
dataframe-tools and methods, your performance is principally in the hands of pandas-team, not much to do here about their design compromises, that had to be done on an ever present dilemma between a speed and a universality.
推荐阅读
- javascript - Vue Modal 按钮移动到 IE 中奇怪的地方
- excel - 更改单元格区域中的值
- docker - 无法访问 docker 容器中的端口
- webpack - babel-loader 被 webpack 配置忽略
- python - 函数内的气流pythonOperator ds变量
- html - 如何使用角度 5+ 的无限滚动?
- reactjs - 导致不同行为的样式化组件
- javascript - mongoose 库需要 json 但不能从 nodejs 制作动态的
- windows - RabbitMQ installing multiple services on windows machine
- php - 为什么我在全新安装 Laravel 时会出现这些错误?