time-series - 如何减少 pandas 滚动在多列上运行时间过长的运行时间 - pandas
问题描述
我正在处理timeseries
数据。我正在尝试将百分比更改应用于数据。
这是数据的快照:
Time EX SC WH YE Lt Ub Yl_2 Wm Wm_2 value
2016-02-15 11:54:00 UTC 4.4 0.14 8.38 755 232 0.009 0.11 1428 1020 FALSE
2016-02-15 11:55:00 UTC 4.4 0.14 8.38 755 232 0.009 0.111 1436 1018 FALSE
2016-02-15 11:56:00 UTC 4.4 0.14 8.38 755 232 0.014 0.113 1471 1019 FALSE
2016-02-15 11:57:00 UTC 4.4 0.14 8.37 755 232 0.015 0.111 1457 1015 FALSE
2016-02-15 11:58:00 UTC 4.4 0.14 8.38 755 232 0.013 0.111 1476 1019 FALSE
2016-02-15 11:59:00 UTC 4.4 0.14 8.36 755 232 0.013 0.114 1416 1015 FALSE
数据的形状是(122334, 10)
这是我的功能:
def percent_change(series):
# Collect all *but* the last value of this window, then the final value
previous_values = series[:-1]
last_value = series[-1]
# Calculate the % difference between the last value and the mean of earlier values
percent_change = (last_value - np.mean(previous_values)) / np.mean(previous_values)
return percent_change
在此处应用该功能:
df2 = df.rolling(10).apply(percent_change)
需要永远,请问我做错了什么?或者我应该怎么做?
谢谢
解决方案
这是一种有效地使用shift()
和rolling()
计算均值的方法:
import pandas as pd
def rolling_pct_change(df, field):
t = df.copy()
t['mean'] = t['x'].shift(1).rolling(3).mean()
t['pct_change'] = ((t['x'] - t['mean']) / t['mean'])
return t
df = pd.DataFrame({'x': [*range(10)]})
df2 = rolling_pct_change(df, 'x')
print(df2)
x mean pct_change
0 0 NaN NaN
1 1 NaN NaN
2 2 NaN NaN
3 3 1.0 2.000000
4 4 2.0 1.000000
5 5 3.0 0.666667
6 6 4.0 0.500000
7 7 5.0 0.400000
8 8 6.0 0.333333
9 9 7.0 0.285714
推荐阅读
- angular - Angular fire trying to use .Where and .OrderBy together but getting error
- c# - LINQ:分组和任何
- php - 使用单选按钮将增量值更新到 phpmyadmin 数据库中
- python - PyQT5 中的全局变量
- python - How to check if a password is valid and matches a regular expression in Python
- vim - Is it possible to visibly mark lines of a file in Vim without altering the file itself?
- sql-server - 检查插入了哪些列
- php - 使用 github auth 的 haeroku 应用程序发生“401 Unauthorized error”
- sql-server - Visual Studio 数据库项目的目的
- java - 如何从 Session 中保存和读取数据?