python - 查看以前的时间序列
问题描述
我有一个如下所示的数据集。我们的想法是每 15 分钟查看一次,而不是我们在 grouper 函数中使用的频率。我想查看前 15 分钟内积极变化的数量。
row Timestamp Direction Positive Neg Nut
1 1/20/19 12:15
2 1/20/19 12:17 Nut
3 1/20/19 12:17 Neg
4 1/20/19 12:18 Neg
5 1/20/19 12:19 Pos
6 1/20/19 12:20 Neg
7 1/20/19 12:21 Neg
8 1/20/19 12:22 Pos
9 1/20/19 12:23 Neg
10 1/20/19 12:24 Pos
11 1/20/19 12:25 Neg
12 1/20/19 12:26 Neg
13 1/20/19 12:27 Neg
14 1/20/19 12:29 Neg
15 1/20/19 12:29 Nut
16 1/20/19 12:30 Pos 4(o2:o16) 9 2
17 1/20/19 12:31 Nut 4(o3:o17) 9 3
18 1/20/19 12:32 Pos 5(o4:o18) 9 2
所以我在 excel 中做 =COUNTIF(Direction2:Direction16,"Pos") 来计算正列。我不确定如何以 Pythonic 方式进行操作。当我尝试应用相同的公式时,我最终将 15 分钟分组,这不是我想要的。每分钟我都会在 excel 中检查前 15 分钟。有人可以让我知道我需要遵循的方法。所以目标是得到正面、负面和中性列。给定的是时间戳和方向列
错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3062 try:
-> 3063 return self._engine.get_loc(key)
3064 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'timestamp'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-87-d00f59bea382> in <module>()
2 #df['timestamp'] = pd.to_datetime(df.timestamp)
3 #df = df.set_index('timestamp')
----> 4 df['timestamp'] = pd.to_datetime(df['timestamp'])
5 df = df.set_index('timestamp')
6
/usr/local/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
2683 return self._getitem_multilevel(key)
2684 else:
-> 2685 return self._getitem_column(key)
2686
2687 def _getitem_column(self, key):
/usr/local/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
2690 # get column
2691 if self.columns.is_unique:
-> 2692 return self._get_item_cache(key)
2693
2694 # duplicate columns & possible reduce dimensionality
/usr/local/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
2484 res = cache.get(item)
2485 if res is None:
-> 2486 values = self._data.get(item)
2487 res = self._box_item_values(item, values)
2488 cache[item] = res
/usr/local/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
4113
4114 if not isna(item):
-> 4115 loc = self.items.get_loc(item)
4116 else:
4117 indexer = np.arange(len(self.items))[isna(self.items)]
/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3063 return self._engine.get_loc(key)
3064 except KeyError:
-> 3065 return self._engine.get_loc(self._maybe_cast_indexer(key))
3066
3067 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'timestamp'
df.info()
RangeIndex: 31106 entries, 0 to 31105
Data columns (total 12 columns):
ID 31106 non-null int64
High 31106 non-null float64
Last 31106 non-null float64
Timestampvalue 31106 non-null int64
Bid 31106 non-null float64
VWap 31106 non-null float64
Volume 31106 non-null float64
Low 31106 non-null float64
Ask 31106 non-null float64
Openamt 31106 non-null float64
Type 31106 non-null object
timestamp 31106 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(8), int64(2), object(1)
memory usage: 2.8+ MB
解决方案
You can use:
#create DatetimeIndex if necessary
#df = df.set_index('timestamp')
#get unique values with counts by comparing and sum of True
cols = df['Direction'].dropna().unique()
for c in cols:
df[c] = df['Direction'].eq(c).rolling('15min').sum()
#if necessary set first 14 minutes to NaNs
df.loc[:df.index[0] + pd.Timedelta(14 * 60, unit='s'), cols] = np.nan
print (df)
row Direction Positive Neg Nut Pos
timestamp
2019-01-20 12:15:00 1 NaN NaN NaN NaN NaN
2019-01-20 12:17:00 2 Nut NaN NaN NaN NaN
2019-01-20 12:17:00 3 Neg NaN NaN NaN NaN
2019-01-20 12:18:00 4 Neg NaN NaN NaN NaN
2019-01-20 12:19:00 5 Pos NaN NaN NaN NaN
2019-01-20 12:20:00 6 Neg NaN NaN NaN NaN
2019-01-20 12:21:00 7 Neg NaN NaN NaN NaN
2019-01-20 12:22:00 8 Pos NaN NaN NaN NaN
2019-01-20 12:23:00 9 Neg NaN NaN NaN NaN
2019-01-20 12:24:00 10 Pos NaN NaN NaN NaN
2019-01-20 12:25:00 11 Neg NaN NaN NaN NaN
2019-01-20 12:26:00 12 Neg NaN NaN NaN NaN
2019-01-20 12:27:00 13 Neg NaN NaN NaN NaN
2019-01-20 12:29:00 14 Neg NaN NaN NaN NaN
2019-01-20 12:29:00 15 Nut NaN NaN NaN NaN
2019-01-20 12:30:00 16 Pos 4(o2:o16) 9.0 2.0 4.0
2019-01-20 12:31:00 17 Nut 4(o3:o17) 9.0 3.0 4.0
2019-01-20 12:32:00 18 Pos 5(o4:o18) 8.0 2.0 5.0
推荐阅读
- javascript - 使用js登录已经存储在firebase db中的用户
- web-services - 如何获取 WSDL 文件 - William Hill SOAP APIs
- php - HTMLPurifier - 更改属性值而不是删除
- javascript - 如何用 Promise 加载 vue 组件?
- c - PreOrder 迭代遍历
- python - 如何对列表中整数的位数求和?
- android - 查看寻呼机后无法查看屏幕底部的按钮
- python - 在 pandas df 标头上添加描述符行
- android - 从隐藏移动到折叠底部工作表
- visual-studio-code - 在 VS Code 中隐藏/显示侧面编辑器?